Pandas Techniques For Data Manipulation

Python is fast becoming the preferred language in data science. It provides the larger ecosystem of a programming language and the depth of good scientific computation libraries. Pandas is a popular Python data analysis tool. It provides easy to use and highly efficient data structures. These data structures deal with numeric or labeled data, stored in the form of tables.

In this blog, we will discover some of the most important data manipulation techniques using pandas. For this purpose, we are going to use Titanic Dataset which is available on Kaggle. Techniques that will be discussed are:

Reading a CSV File
Dropping columns in the data
Dropping rows in the data
Select columns with specific data types
Replacing values in a DataFrame

1. Reading a CSV file

The CSV (Comma Separated Values) format is quite popular for storing data. A large number of datasets are present as CSV files which can be used either directly in software like Excel or can be loaded up by using programming languages like Python.

import pandas as pd
# read the csv data using pd.read_csv function
data = pd.read_csv('test.csv')
data.head()

DataFrame provides a member function drop () i.e. It accepts a single or list of label names and deletes the corresponding rows or columns (based on the value of axis parameter i.e. 0 for rows or 1 for columns).

DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

2. Dropping columns in the data

df_dropped = data.drop('Parch', axis=1)
df_dropped.head()

The ‘Parch’ column is dropped in the data. The axis=1 denotes that it ‘Parch’ is a column, so it searches ‘Parch’ column-wise to drop.

We can drop multiple columns at the same time using the following code:

# Drop multiple columns
df_dropped_multiple = data.drop(['SibSp', 'Name'], axis=1)
df_dropped_multiple.head()

The columns ‘SibSp’ and ‘Name’ are dropped in the data.

3. Dropping rows in the data

df_row_dropped = data.drop(2, axis=0)
df_row_dropped.head()

The row with index 2 is dropped in the data. The axis=0 denotes that index 2 is a row, so it searches the index 2 column-wise.

We can drop multiple rows at the same time using the following code:

# Drop multiple rows 
df_row_dropped_multiple = data.drop([1,4], axis=0)
df_row_dropped_multiple.head()

4. Select columns with specific data types

Pandas select_dtypes function allows us to specify a data type and select columns matching the data type.

#for integer data type
integer_data = data.select_dtypes('int')
integer_data.head()

#for float data type
float_data = data.select_dtypes('float')
float_data.head()

The above code selects all columns with integer and float data types

5. Replacing values in a DataFrame

We can also replace values inplace, rather than having to re-assign them. This is done simply by setting inplace= to True

data['Sex'].replace(['male', 'female'], ["M", "F"])

The above code replaces ‘male’ as ‘M’ and ‘female’ as ‘F’.

References:

Link to the GitHub Repo: Data-Insight-s-Data-Scientist-Program-2021/Pandas Technique at master · Tanushree28/Data-Insight-s-Data-Scientist-Program-2021 (github.com)

Thank you for your time.

datainsightonline.com

Data Scientist Program

Free Online Data Science Training for Complete Beginners.

No prior coding knowledge required!

Pandas Techniques For Data Manipulation

1. Reading a CSV file

2. Dropping columns in the data

3. Dropping rows in the data

4. Select columns with specific data types

5. Replacing values in a DataFrame

Recent Posts

Comentários

40 Python Projects with Source Code for Beginners

How to Read Medium Premium Articles for Free

How to use Sqlite3 using Python

Data Visualization - which types of graphs should we use?

Best Online Courses for Data Science

9 Ways to Embed Code Snippets on your Data Science Blog Posts