• Descriptive Data Assessment
    • Edgar Anderson’s Iris Data
      • About the data set
      • Variables
      • Visualization

Descriptive Data Assessment


Edgar Anderson’s Iris Data

About the data set

This famous data were collected by Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5.

The iris data set contains measurements on 4 different attributes for 50 flowers from 3 different species.

# Here, the first rows of data frame iris
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Variables

The Iris data set consists of 5 variables: Sepal.Length (cm), Sepal.Width (cm), Petal.Lenght (cm), Petal.With (cm) for the flower of the species Iris setosa, Iris versicolor, and Iris virginica

Visualization

A statistical summary can be calculated by:

summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

Some interesting graphics could be done by:

plot(iris)

boxplot(Sepal.Length~Species, data=iris)