Agenda


  • what are pipes?
  • why use pipes?
  • what are the different types of pipes?
  • combining operations with pipes
  • case studies

Introduction


R code contain a lot of parentheses in case of a sequence of multiple operations. When you are dealing with complex code, it results in nested function calls which are hard to read and maintain. The magrittr package by Stefan Milton Bache provides pipes enabling us to write R code that is readable.

Pipes allow us to clearly express a sequence of multiple operations by:

  • structuring operations from left to right
  • avoiding
    • nested function calls
    • intermediate steps
    • overwriting of original data
  • minimizing creation of local variables

Pipes


If you are using tidyverse, magrittr will be automatically loaded. We will look at 3 different types of pipes:

  • %>% : pipe a value forward into an expression or function call
  • %<>%: result assigned to left hand side object instead of returning it
  • %$% : expose names within left hand side objects to right hand side expressions

Libraries


Data



## # A tibble: 1,000 x 4
##    referrer n_pages duration purchase
##    <fct>      <dbl>    <dbl> <lgl>   
##  1 google         1      693 FALSE   
##  2 yahoo          1      459 FALSE   
##  3 direct         1      996 FALSE   
##  4 bing          18      468 TRUE    
##  5 yahoo          1      955 FALSE   
##  6 yahoo          5      135 FALSE   
##  7 yahoo          1       75 FALSE   
##  8 direct         1      908 FALSE   
##  9 bing          19      209 FALSE   
## 10 google         1      208 FALSE   
## # ... with 990 more rows

Data Dictionary


  • referrer: referrer website/search engine
  • n_pages: number of pages visited
  • duration: time spent on the website (in seconds)
  • purchase: whether visitor purchased

Sample Data


Using pipe


## # A tibble: 10 x 4
##    referrer n_pages duration purchase
##    <fct>      <dbl>    <dbl> <lgl>   
##  1 google         1      693 FALSE   
##  2 yahoo          1      459 FALSE   
##  3 direct         1      996 FALSE   
##  4 bing          18      468 TRUE    
##  5 yahoo          1      955 FALSE   
##  6 yahoo          5      135 FALSE   
##  7 yahoo          1       75 FALSE   
##  8 direct         1      908 FALSE   
##  9 bing          19      209 FALSE   
## 10 google         1      208 FALSE

Square Root


##  [1] 1.000000 1.000000 2.000000 1.000000 4.472136 1.000000 1.414214
##  [8] 1.000000 1.414214 1.000000

Square Root - Using pipe


Square Root - Using Pipe




Square Root - Using pipe


##  [1] 1.000000 1.000000 2.000000 1.000000 4.472136 1.000000 1.414214
##  [8] 1.000000 1.414214 1.000000

Correlation




Correlation


## [1] 0.4290905

Correlation - Using pipe


## [1] 0.4290905
## [1] 0.4290905

Visualization


Visualization



Visualization - Using pipe


Regression


## 
## Call:
## lm(formula = duration ~ n_pages, data = ecom)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -386.45 -213.03  -38.93  179.31  602.55 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  404.803     11.323  35.750  < 2e-16 ***
## n_pages       -8.355      1.296  -6.449 1.76e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 263.3 on 998 degrees of freedom
## Multiple R-squared:   0.04,  Adjusted R-squared:  0.03904 
## F-statistic: 41.58 on 1 and 998 DF,  p-value: 1.756e-10

Regression - Using pipe


## 
## Call:
## lm(formula = duration ~ n_pages)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -386.45 -213.03  -38.93  179.31  602.55 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  404.803     11.323  35.750  < 2e-16 ***
## n_pages       -8.355      1.296  -6.449 1.76e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 263.3 on 998 degrees of freedom
## Multiple R-squared:   0.04,  Adjusted R-squared:  0.03904 
## F-statistic: 41.58 on 1 and 998 DF,  p-value: 1.756e-10

String Manipulation


## [1] "JOVIAL"

String Manipulation - Using Pipe


## [1] "JOVIAL"

Data Extraction


  • extract()
  • extract2()
  • use_series()

Extract Column By Name


## # A tibble: 10 x 1
##    n_pages
##      <dbl>
##  1       1
##  2       1
##  3       4
##  4       1
##  5      20
##  6       1
##  7       2
##  8       1
##  9       2
## 10       1
## # A tibble: 10 x 1
##    n_pages
##      <dbl>
##  1       1
##  2       1
##  3       4
##  4       1
##  5      20
##  6       1
##  7       2
##  8       1
##  9       2
## 10       1

Extract Column By Position


## # A tibble: 10 x 1
##    n_pages
##      <dbl>
##  1       1
##  2       1
##  3       4
##  4       1
##  5      20
##  6       1
##  7       2
##  8       1
##  9       2
## 10       1

Extract Column By Position


## # A tibble: 10 x 1
##    n_pages
##      <dbl>
##  1       1
##  2       1
##  3       4
##  4       1
##  5      20
##  6       1
##  7       2
##  8       1
##  9       2
## 10       1

Extract Column (as vector)


##  [1]  1  1  4  1 20  1  2  1  2  1

Extract Column (as vector)


##  [1]  1  1  4  1 20  1  2  1  2  1

Sample List


Extract List Element By Name


##  [1]  1  1  4  1 20  1  2  1  2  1
##  [1]  1  1  4  1 20  1  2  1  2  1

Extract List Element By Name


##  [1]  1  1  4  1 20  1  2  1  2  1
##  [1]  1  1  4  1 20  1  2  1  2  1

Extract List Element By Position


##  [1] bing   direct google direct google yahoo  google bing   yahoo  direct
## Levels: bing direct social yahoo google
##  [1] bing   direct google direct google yahoo  google bing   yahoo  direct
## Levels: bing direct social yahoo google

Extract List Element (as vector)


##  [1]  1  1  4  1 20  1  2  1  2  1
##  [1]  1  1  4  1 20  1  2  1  2  1

Arithmetic Operations


  • add()
  • subtract()
  • multiply_by()
  • multiply_by_matrix()
  • divide_by()
  • divide_by_int()
  • mod()
  • raise_to_power()

Addition


##  [1]  2  3  4  5  6  7  8  9 10 11
##  [1]  2  3  4  5  6  7  8  9 10 11
##  [1]  2  3  4  5  6  7  8  9 10 11

Multiplication


##  [1]  3  6  9 12 15 18 21 24 27 30
##  [1]  3  6  9 12 15 18 21 24 27 30
##  [1]  3  6  9 12 15 18 21 24 27 30

Division


##  [1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
##  [1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
##  [1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Power


##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
##  [18]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
##  [35]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51
##  [52]  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
##  [69]  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
##  [86]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
##  [1]   1   4   9  16  25  36  49  64  81 100
##  [1]   1   4   9  16  25  36  49  64  81 100

Logical Operators


  • and()
  • or()
  • equals()
  • not()
  • is_greater_than()
  • is_weakly_greater_than()
  • is_less_than()
  • is_weakly_less_than()

Greater Than


##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

Weakly Greater Than


##  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE