What story does your data tell? Reading the Iris dataset Data Visualization Rule Based System Conclusion


Data storytelling is the flexibility to effectively communicate insights from a dataset using narratives and visualizations. It could be used to place data insights into context for and encourage motion out of your audience. In this text, I’ll perform data exploration on the Iris dataset. The aim of this text is for example the importance of information exploration and create a rule-based system for classifying the Iris dataset using just data exploration.

The Iris flower dataset sometimes called Fisher’s Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper “Using multiple measurements in taxonomic problems for example of linear discriminant evaluation”. It accommodates 3 classes of fifty instances each, where each class refers to a sort of iris plant. The three classes are Iris-setosa, Iris-versicolor and Iris-virginica.

All of the code and jupyter notebook can be found at

heebyyy/Iris-Medium-Article (github.com)

To start data exploration on this dataset, import the required libraries and cargo our data from the UCI Machine Learning Repository.

# Importing Libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

# Reading the dataset
column_names = ["sepal length","sepal width","petal length","petal width","Type of flower"]
iris = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data",names=column_names)

The primary five rows of the dataset are shown below.

first five rows of the iris dataset

Check the form of the information with the code cell below,

# Shape of dataset

The output shows that the dataset has shape (150, 5) — 150 rows and five columns. The five columns are sepal length, sepal width, petal length, petal width and, Kind of flower (the category each flower belongs to).

The dataset also has 50 instances of every of the three classes.

sort of flower and variety of instances

A plot of a 2D scatter plot of the sepal length and sepal is shown within the code cell below. Scatter plots are used to watch relationships between variables and uses dots to represent values.

iris.plot(kind = 'scatter', x = 'sepal length', y = 'sepal width')

Below is the visualization of the 2D scatter plot.

2D Scatter plot of sepal width against sepal length

There just isn’t much information we are able to detect from this, so color was added using the sort of flower as hue for the plot. The code cell and result are shown below.

# scatter plot of  sepal width against sepal length color coded with sort of flower
sns.FacetGrid(iris, hue='Kind of flower', size = 6).map(plt.scatter, 'sepal length', 'sepal width', ).add_legend()
scatter plot of sepal width against sepal length color coded with the sort of flower

From the visualization above, it may be noticed that using only the sepal length and sepal width, a straight line will be drawn that clearly separates Iris-setosa from the opposite two classes.

The 3D scatter plot of the iris dataset will be seen on the plotly website. The 3D scatter plot needs a whole lot of mouse interaction to interpret the information, you possibly can visit the plotly website to perform this interaction. The code cell for creating the scatter plot and the resulting visualization is shown below.

import plotly.express as px
df = px.data.iris()
fig = px.scatter_3d(df, x='sepal_length', y='sepal_width', z='petal_width',
3D Scatter plot of sepal length, sepal width and petal width

it may even be conclude that through the use of sepal length, sepal width and petal width, we are able to draw two planes which might separate the three classes of flowers despite the fact that there is likely to be a few misclassifications amongst the Iris-versicolor and Iris-virginica.

The 4D Scatter plot, sometimes also often known as pair plot is a plot that shows pairwise relationships in a dataset. The pair plot is created using the code cell below.

sns.pairplot(iris, hue = 'Kind of flower', height = 4, diag_kind='hist',

Below is the results of the 4D Scatter plot.

The pair plot creates a plot of every column against all other columns and the diagonal is the plot of a column against itself. The main target is the highlighted visualization of the petal length and petal width. Using just these two features, we are able to separate the three classes by drawing lines on the plot. Despite the fact that there might still be a few misclassifications amongst Iris-versicolor and Iris-virginica, a rule based system will be created for classifying these flowers.

If petal length is lower than or equal to 2 and petal width is lower than or equal to 1, then the flower is unquestionably Iris-setosa (blue color)
If petal length is lower than 5 and is larger than or equal 2.5, and petal width is lower than or equal to 2 and greater than or equal to 1, then it’s Iris-versicolor (orange color)
Else, it’s Iris-virginica (green color)

It could be concluded that petal length and petal width are probably the most useful features in identifying the varied flower types and that using data exploration and visualization, a rule-based system will be created (though not 100% accurate) for classifying Iris flower into their differing types.

For any more information, you possibly can reach out to me through email at olayemibolaji1@gmail.com or connect with me on LinkedIn


What are your thoughts on this topic?
Let us know in the comments below.


0 0 votes
Article Rating
1 Comment
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

Would love your thoughts, please comment.x