top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

Writer's pictureKala Maya Sanyasi

Data Visualisation in Python

For this blog post, data from Iris Dataset is used.

First we have to import the python libraries that will be used for the data visualisation of the Iris dataset.

We import pandas, seaborn and matplotlib libraries for this Data Visualisation as follows

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Now we have to load the Iris Flower Datasets, by using pd.read_csv method and give the file path along with the file name as shown below. After loading the file, to view the first five rows of our Data we use the head() method.

iris = pd.read_csv("Iris.csv")
iris.head()

In order to find out the number of each species we will use value_counts() method and it will display the value of each flower species.

iris["Species"].value_counts()

Output
Iris-virginica     50
Iris-versicolor    50
Iris-setosa        50
Name: Species, dtype: int64

Scatter Plot

For the Data Visualisation, first we will create a scatter plot of our data. Where we declare the kind as scatter, give what column data to be used for the x and y axis of our plot.

iris.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm")

We can also make a boxplot with Pandas on each feature split out by species

iris.drop("Id", axis=1).boxplot(by="Species", figsize=(12, 6))

Output
array([[<AxesSubplot:title={'center':'PetalLengthCm'}, xlabel='[Species]'>,
        <AxesSubplot:title={'center':'PetalWidthCm'}, xlabel='[Species]'>],
       [<AxesSubplot:title={'center':'SepalLengthCm'}, xlabel='[Species]'>,
        <AxesSubplot:title={'center':'SepalWidthCm'}, xlabel='[Species]'>]],
      dtype=object

Scatter plot using Seaborn

We can also create a scatter plot using seaborn library. By using seaborn jointplot it shows both scatterplot and univariate histograms in the same figure as shown below.

sns.jointplot(x="SepalLengthCm", y="SepalWidthCm", data=iris, size=5)

Here we cannot identify which one belongs to which species so, We will use seaborn's FacetGrid to color the scatter plot by species and also add the legend.

sns.FacetGrid(iris, hue="Species", size=5) \
   .map(plt.scatter, "SepalLengthCm", "SepalWidthCm") \
   .add_legend()

Boxplot using seaborn

We can also look at an individual feature in Seaborn through a boxplot as follows.

sns.boxplot(x="Species", y="PetalLengthCm", data=iris)

Output
<AxesSubplot:xlabel='Species', ylabel='PetalLengthCm'>

One way we can extend this plot is by adding a layer of individual points on top of it through Seaborn's striplot. We will use jitter=True so that all the points don't fall in single vertical lines above the species. Saving the resulting axes as ax each time causes the resulting plot to be shown on top of the previous axes

ax = sns.boxplot(x="Species", y="PetalLengthCm", data=iris)
ax = sns.stripplot(x="Species", y="PetalLengthCm", data=iris, jitter=True, edgecolor="gray")






0 comments

Recent Posts

See All

Comments


bottom of page