Pandas Techniques In Python For Data Manipulation

Nov 18, 20212 min read

Data manipulation with python is defined as a process in the python programming language that enables users in data organization in order to make reading or interpreting the insights from the data more structured and comprises of having better design.

In this post i will explain some pandas techniques in python, Each technique will be demonstrated with a simple code.

Filtering values on the basis of given condition

Apply function

Renaming the columns

Slicing the rows

Sorting a table

Plotting of boxplots and histograms

Getting started

The first step to start using Pandas is importing the library.

import pandas as pd

Now to load and read the data in a DataFrame we use read_csv.

You can find the csv file of this "Iris_trainig" database available on Kaggle Kaggle.

Iris Database: Iris dataset contains 5 columns such as petal length, petal width, sepal length, sepal width and species type.

data= pd.read_csv('iris_training.csv')
data

This code gives the following result:

Filtering values on the basis of given condition

In order to to work with specific data that meets the precise criteria, we would need to use the corresponding data manipulation to meet the conditions.

In this technique we have to use the .loc function, this function allows us to access a group of rows and / or columns using a boolean array or labels.

data.loc[(data["SepalLengthCm"]>=5) & (data["SepalWidthCm"]<=3) & (data["PetalLengthCm"]>1.2), ["Id", "SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "Species"]]

This code gives the following result:

Apply function

to manipulate columns and rows in a DataFrame we use the pandas apply () function.

This function applies the corresponding declared function to the respective axis (0 for column and 1 for rows) and finally return the required variable as per the requirement.

def missingValues(x):
    return sum(x.isnull())
print("Number of missing elements column wise:")
print(data.apply(missingValues, axis=0))
print("\nNumber of missing elements row wise:")
print(data.apply(missingValues, axis=1).head())

This code gives the following result:

Renaming the columns

Let's move on to the Rename function with one of the python pandas library functions.To rename a column you must use the rename () function, creating a dictionary under the name "newcols" to update our new column names.

The following code illustrates that.

newcols={
"Id":"id",
"SepalLengthCm":"sepallength",
"SepalWidthCm":"sepalwidth"}
 
data.rename(columns=newcols,inplace=True)
 
print(data.head())

This code gives the following result:

Slicing the rows

With Slicing we can selects a set of rows and/or columns from a DataFrame.

To slice out a set of rows, we use the following syntax: data[start:stop] .

in slicing in pandas the start bound is included in the output and the stop bound is one step BEYOND the row you want to select.

print(data[10:21])
# it will print the rows from 10 to 20.
 
# you can also save it in a variable for further use in analysis
sliced_data=data[10:21]
print(sliced_data)

This code gives the following result:

Sorting a table

With the use of the “.sort_values” function we can sort a table on the basis of keys which will be passed as a parameter as well as we can pass a list of columns and the table would be sorted on the basis of chronological order.

data_sorted = data.sort_values(['sepallength','sepalwidth'], ascending=False)
data_sorted[['sepallength','sepalwidth']].head(10)

This code gives the following result:

Plotting of boxplots and histograms

now in the The end of manipulation, to explain the dataset and to understand the different statistical parameters of the data and its inference we have to plot boxplot and histogram in order using boxlot, hist functions.

import matplotlib.pyplot as plt
%matplotlib inline
data.boxplot(column="sepallength",by="Species")


data.hist(column="sepallength",by="Species",bins=30)

Thank you for regarding!

You can find the complete source code here Github

datainsightonline.com

Data Scientist Program

Free Online Data Science Training for Complete Beginners.

No prior coding knowledge required!

Pandas Techniques In Python For Data Manipulation

Getting started

Recent Posts

Comments

40 Python Projects with Source Code for Beginners

How to Read Medium Premium Articles for Free

How to use Sqlite3 using Python

Data Visualization - which types of graphs should we use?

Best Online Courses for Data Science

9 Ways to Embed Code Snippets on your Data Science Blog Posts