## Agenda

• understand categorical data
• ordinal vs nominal data
• how to create factors
• how to check
• number of levels
• names of levels
• how to create ordered factors
• how to
• tabulate levels
• reorder levels
• reverse levels
• collapse levels
• recode levels
• recategorize levels
• shift levels

## Libraries

``````library(forcats)
library(magrittr)
library(dplyr)
library(descriptr)``````

## Factors

``str(descriptr::mtcarz)``
``````## 'data.frame':    32 obs. of  11 variables:
##  \$ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  \$ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
##  \$ disp: num  160 160 108 258 360 ...
##  \$ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  \$ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  \$ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  \$ qsec: num  16.5 17 18.6 19.4 17 ...
##  \$ vs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
##  \$ am  : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
##  \$ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
##  \$ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...``````

## Levels

``nlevels(mtcarz\$gear)``
``## [1] 3``

## Level Names

``levels(mtcarz\$gear)``
``## [1] "3" "4" "5"``

## Ordered Factors

``ordered(mtcarz\$gear, levels = c(3, 4, 5))``
``````##  [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4
## Levels: 3 < 4 < 5``````

## Convert to Factor

``as.factor(mtcars\$cyl)``
``````##  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
## Levels: 4 6 8``````

## Labels

``factor(mtcars\$cyl, labels = c("four", "six", "eight"))``
``````##  [1] six   six   four  six   eight six   eight four  four  six   six
## [12] eight eight eight eight eight eight four  four  four  four  eight
## [23] eight eight eight four  four  four  eight six   eight four
## Levels: four six eight``````

## Exclude Levels

``factor(mtcars\$cyl, exclude = 4)``
``````##  [1] 6    6    <NA> 6    8    6    8    <NA> <NA> 6    6    8    8    8
## [15] 8    8    8    <NA> <NA> <NA> <NA> 8    8    8    8    <NA> <NA> <NA>
## [29] 8    6    8    <NA>
## Levels: 6 8``````

## Case Study: Data

``````## # A tibble: 48,232 x 1
##    traffics
##    <fct>
## # ... with 48,222 more rows``````

## Extract Column

``````traffics <- use_series(web_traffic, traffics)
traffics``````
``````##     [1] google     google     google     google     google     google