Agenda


  • create univariate/multivariate box plots
  • interpret box plots
  • create horizontal box plots
  • detect outliers
  • modify box color
  • use formula to compare distributions of different variables
  • use notches to compare medians

Introduction


  • the box plot is a standardized way of displaying the distribution of data
  • box plots are useful for detecting outliers and for comparing distributions
  • it shows the shape, central tendancy and variability of the data

Structure


  • the body of the boxplot consists of a “box” (hence, the name), which goes from the first quartile (Q1) to the third quartile (Q3)
  • within the box, a vertical line is drawn at the Q2, the median of the data set
  • two horizontal lines, called whiskers, extend from the front and back of the box
  • the front whisker goes from Q1 to the smallest non-outlier in the data set, and the back whisker goes from Q3 to the largest non-outlier
  • if the data set includes one or more outliers, they are plotted separately as points on the chart

Data


daily_returns <- read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/tickers.csv')
## # A tibble: 250 x 5
##         AAPL       AMZN        FB       GOOG      MSFT
##        <dbl>      <dbl>     <dbl>      <dbl>     <dbl>
##  1  1.377845  24.169983  2.119995  22.409973  1.120701
##  2  2.834412   3.250000 -0.860001   5.989990  0.766800
##  3 -0.039360   9.910034  1.450005   6.750000  0.973240
##  4  0.108261   3.759949 -0.770004 -10.690002 -0.285091
##  5  1.643570  19.840027  4.750000   8.660034  0.501365
##  6  0.068894   5.330017 -0.299996  -0.929992  0.255596
##  7 -0.560975  -5.210022 -0.630005  -7.280030 -0.707809
##  8  0.551140   0.250000 -0.459999   0.690003  0.127796
##  9 -0.216522 -13.599975  0.030007   6.559997  0.078648
## 10 -0.108253  -4.250000  0.459999   2.600037  0.471878
## # ... with 240 more rows

Univariate Box Plot


boxplot(daily_returns$AAPL)

Horizontal Box Plot


boxplot(daily_returns$AAPL, horizontal = TRUE)

Color


boxplot(daily_returns$AAPL, col = 'blue')