top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

Writer's pictureTanushree Nepal

Pandas Techniques For Data Manipulation



Python is fast becoming the preferred language in data science. It provides the larger ecosystem of a programming language and the depth of good scientific computation libraries. Pandas is a popular Python data analysis tool. It provides easy to use and highly efficient data structures. These data structures deal with numeric or labeled data, stored in the form of tables.

In this blog, we will discover some of the most important data manipulation techniques using pandas. For this purpose, we are going to use Titanic Dataset which is available on Kaggle. Techniques that will be discussed are:

  1. Reading a CSV File

  2. Dropping columns in the data

  3. Dropping rows in the data

  4. Select columns with specific data types

  5. Replacing values in a DataFrame

1. Reading a CSV file

The CSV (Comma Separated Values) format is quite popular for storing data. A large number of datasets are present as CSV files which can be used either directly in software like Excel or can be loaded up by using programming languages like Python.

import pandas as pd
# read the csv data using pd.read_csv function
data = pd.read_csv('test.csv')
data.head()

DataFrame provides a member function drop () i.e. It accepts a single or list of label names and deletes the corresponding rows or columns (based on the value of axis parameter i.e. 0 for rows or 1 for columns).

DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

2. Dropping columns in the data

df_dropped = data.drop('Parch', axis=1)
df_dropped.head()

The ‘Parch’ column is dropped in the data. The axis=1 denotes that it ‘Parch’ is a column, so it searches ‘Parch’ column-wise to drop.


We can drop multiple columns at the same time using the following code:

# Drop multiple columns
df_dropped_multiple = data.drop(['SibSp', 'Name'], axis=1)
df_dropped_multiple.head()

The columns ‘SibSp’ and ‘Name’ are dropped in the data.


3. Dropping rows in the data

df_row_dropped = data.drop(2, axis=0)
df_row_dropped.head()

The row with index 2 is dropped in the data. The axis=0 denotes that index 2 is a row, so it searches the index 2 column-wise.


We can drop multiple rows at the same time using the following code:

# Drop multiple rows 
df_row_dropped_multiple = data.drop([1,4], axis=0)
df_row_dropped_multiple.head()

4. Select columns with specific data types

Pandas select_dtypes function allows us to specify a data type and select columns matching the data type.

#for integer data type
integer_data = data.select_dtypes('int')
integer_data.head()
#for float data type
float_data = data.select_dtypes('float')
float_data.head()

The above code selects all columns with integer and float data types


5. Replacing values in a DataFrame

We can also replace values inplace, rather than having to re-assign them. This is done simply by setting inplace= to True

data['Sex'].replace(['male', 'female'], ["M", "F"])

The above code replaces ‘male’ as ‘M’ and ‘female’ as ‘F’.


References:


Thank you for your time.

0 comments

Recent Posts

See All

Comentários


bottom of page