top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

Begin with these 5 tools to learn pandas library effectively for data manipulation

Writer's picture: 074bex435.ranjan074bex435.ranjan

Data is everywhere and not in proper order. Many insights can be taken by proper analysing of data. Here we read some 5 techniques of pandas library.

  1. Reading CSV files

  2. Crosstab

  3. Subsetting with .loc

  4. Cleaning empty data

  5. Plotting


  1. Reading CSV fies: To play with the data, we need to import data from many format files, one of the prevalent is csv file. Also some data in the dataframe can be seen by dt.head() command shown as below.

import pandas as pd
data = pd.read_csv('https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv')
data.head()

2. Crosstab: This tool helps to summarize the large datasets by making a crosstab table with row as identifier and the frequency of occurrence of any thing in the columns.

#importing packages
import pandas as pd
import numpy

# creating some arrays
a = numpy.array(["hello", "hello", "hello", "hello","hy", "hy", "hy", "hy","hello", "hello"],
                dtype=object)
  
b = numpy.array(["one", "one", "one", "two","one", "one", "one", "two","two", "two"],
                dtype=object)
  
c = numpy.array(["handsome","beautiful","hy", "hy", "beautiful","beautiful", "hy", "beautiful","beautiful", "beautiful"],
                dtype=object)
  
# form the cross tab
pd.crosstab(a, [b, c], rownames=['greetings'], colnames=['number', 'feature'])


Here, 'hello', 'one' and 'beautiful' simulataneously occur only one time in same index. Similarly, other values are interpreted.


3. Subsetting with .loc: It accepts index values. When a single argument is passed, it will take a subset of rows.

sample = pd.read_csv('sample.csv',index_col='avg_rating')
sample.head()

here = sample.loc[4.5]
here

4. Cleaning empty cells: While extracting data, many cells are empty. This may hamper our result. So we should avoid those cells or fill with some value.

data = pd.read_csv('data.csv')
data


new_data = data.dropna()
new_data

We can also fill NaN with median as

value = data["Calories"].median()

data["Calories"].fillna(value, inplace = True) 


5. Plotting: Picture speaks many things. We can interpret many thing from data looking at the figure like bar diagram, scatter plot, pie-chart, etc.

import matplotlib.pyplot as plt

data.plot()

plt.show() 

We can plot other graph also by placing kind='scatter',kind='hist' as the argument in plot function.


data.plot(kind='hist')
plt.show()


0 comments

Recent Posts

See All

Comentarios


COURSES, PROGRAMS & CERTIFICATIONS

 

Advanced Business Analytics Specialization

Applied Data Science with Python (University of Michigan)

Data Analyst Professional Certificate (IBM)

Data Science Professional Certificate (IBM)

Data Science Specialization (John Hopkins University)

Data Science with Python Certification Training 

Data Scientist Career Path

Data Scientist Nano Degree Program

Data Scientist Program

Deep Learning Specialization

Machine Learning Course (Andrew Ng @ Stanford)

Machine Learning, Data Science and Deep Learning

Machine Learning Specialization (University of Washington)

Master Python for Data Science

Mathematics for Machine Learning (Imperial College London)

Programming with Python

Python for Everybody Specialization (University of Michigan)

Python Machine Learning Certification Training

Reinforcement Learning Specialization (University of Alberta)

Join our mailing list

Data Insight participates in affiliate programs and may sometimes get a commission through purchases made through our links without any additional cost to our visitors.

bottom of page