A histogram is a plot that can be used to examine the shape and spread of continuous data. It looks very similar to a bar graph and can be used to detect outliers and skewness in data. The histogram graphically shows the following:
To construct a histogram
library(ggplot2)
library(dplyr)
library(tidyr)
ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv')
## # A tibble: 1,000 x 11
## id referrer device bouncers n_visit n_pages duration country
## <int> <chr> <chr> <chr> <int> <dbl> <dbl> <chr>
## 1 1 google laptop true 10 1 693 Czech Republic
## 2 2 yahoo tablet true 9 1 459 Yemen
## 3 3 direct laptop true 0 1 996 Brazil
## 4 4 bing tablet false 3 18 468 China
## 5 5 yahoo mobile true 9 1 955 Poland
## 6 6 yahoo laptop false 5 5 135 South Africa
## 7 7 yahoo mobile true 10 1 75 Bangladesh
## 8 8 direct mobile true 10 1 908 Indonesia
## 9 9 bing mobile false 3 19 209 Netherlands
## 10 10 google mobile true 6 1 208 Czech Republic
## # ... with 990 more rows, and 3 more variables: purchase <chr>,
## # order_items <dbl>, order_value <dbl>
ggplot(ecom) +
geom_histogram(aes(n_visit))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(ecom) +
geom_histogram(aes(n_visit), bins = 7)
ggplot(ecom) +
geom_histogram(aes(n_visit), bins = 7, fill = 'blue')
ggplot(ecom) +
geom_histogram(aes(n_visit), bins = 7, fill = 'blue', alpha = 0.3)
ggplot(ecom) +
geom_histogram(aes(n_visit), bins = 7, fill = 'white', color = 'blue')
ggplot(ecom) +
geom_histogram(aes(n_visit), bins = 7, fill = 'blue', color = 'white')
ggplot(ecom) +
geom_histogram(aes(n_visit), binwidth = 2, fill = 'blue', color = 'black')
ggplot(ecom) +
geom_histogram(aes(n_visit), bins = 5, fill = 'white',
color = 'blue', linetype = 3)
ggplot(ecom) +
geom_histogram(aes(n_visit), bins = 5, fill = 'white',
color = 'blue', size = 1.25)
ggplot(ecom) +
geom_histogram(aes(n_visit, fill = device), bins = 7)