The Iris flower data set or Fisher’s Iris data set is a multivariate data set used and made famous by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems. It is sometimes called Anderson’s Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. Two of the three species were collected in the Gaspé Peninsula “all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus”
This famous iris data set gives the measurements in centimeters of
the variables sepal length and width
and petal length and width,
respectively, for 50 flowers from each of 3 species of
iris. The species are:
Iris setosa
Iris versicolor Iris virginica
Five variables are included in Iris:
Below you can see some statistics for each variable, also you can see that each species has 50 measurements.
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
Also here is the plot that shows the relationship between variables. You can see that Sepal and Petal length are highly correlated, similarly Petal length and width are also correlated.
pairs(iris[1:4], main= "Iris Data", pch=19)
Finally, the following plots show the value of each variable for each species. As you can see I. virginica is the species with larger petals and sepals, except for sepal width that is larger in I. setosa.
par(mfrow=c(2,2))
boxplot(iris$Sepal.Length~iris$Species, xlab="Species", ylab="Sepal length", col=c("slateblue2", "orange","darkgreen"))
boxplot(iris$Sepal.Width~iris$Species,xlab="Species", ylab="Sepal width",col=c("slateblue2", "orange","darkgreen"))
boxplot(iris$Petal.Length~iris$Species,xlab="Species", ylab="Petal length",col=c("slateblue2", "orange","darkgreen"))
boxplot(iris$Petal.Width~iris$Species,xlab="Species", ylab="Petal width",col=c("slateblue2", "orange","darkgreen"))
That is all for now! thanks for your attention!