Agenda


  • detect patterns
  • count occurence of patterns
  • split strings
  • replace strings
  • extract patterns
  • locate patterns

Libraries


library(stringr)
library(dplyr)
library(readr)

Data


mockstring <- read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/mock_strings.csv')
## # A tibble: 1,000 x 12
##       id image_url  domain imageurl email filename phone address url      
##    <int> <chr>      <chr>  <chr>    <chr> <chr>    <chr> <chr>   <chr>    
##  1     1 https://r~ addto~ http://~ mnew~ PedeMal~ 66-(~ 8 Anha~ https://~
##  2     2 https://r~ gmpg.~ http://~ mdan~ Loborti~ 351-~ 697 Ea~ http://d~
##  3     3 https://r~ samsu~ http://~ hgir~ CongueD~ 33-(~ 89 Dot~ https://~
##  4     4 https://r~ spoti~ http://~ pmcm~ Eleifen~ 86-(~ 98135 ~ http://i~
##  5     5 https://r~ wunde~ http://~ dris~ PurusPh~ 223-~ 7814 P~ https://~
##  6     6 https://r~ alexa~ http://~ cphl~ Element~ 420-~ 4897 L~ https://~
##  7     7 https://r~ googl~ http://~ kdod~ Mattis.~ 1-(7~ 53541 ~ http://v~
##  8     8 https://r~ ed.gov http://~ vhou~ PurusEu~ 62-(~ 4819 H~ https://~
##  9     9 https://r~ jigsy~ http://~ rdik~ JustoEt~ 1-(6~ 68096 ~ https://~
## 10    10 https://r~ jugem~ http://~ tdud~ Ante.ti~ 30-(~ 9595 S~ https://~
## # ... with 990 more rows, and 3 more variables: full_name <chr>,
## #   currency <chr>, passwords <chr>

Case Study


  • extract domain name from random email ids
  • extract image type from url
  • extract image dimension from url
  • extract extension from domain name
  • extract http protocol from url
  • extract domain name from url
  • extract extension from url
  • extract file type from url

Sample Data


mock_data
## # A tibble: 10 x 4
##    email                        address            full_name      currency
##    <chr>                        <chr>              <chr>          <chr>   
##  1 mnewburn0@fastcompany.com    8 Anhalt Crossing  Mufi Ruit      ¥34.37  
##  2 mdankersley1@digg.com        697 East Avenue    Leese Furmagi~ $67.37  
##  3 hgirhard2@altervista.org     89 Dottie Circle   Blakelee Wils~ €33,85  
##  4 pmcmenamy3@sciencedirect.com 98135 Blue Bill P~ Terencio McIl~ €42,89  
##  5 drisbrough4@bandcamp.com     7814 Pennsylvania~ Debee McErlai~ €13,19  
##  6 cphlippi5@surveymonkey.com   4897 Little Fleur~ Fran Painten   ¥87.35  
##  7 kdodswell6@un.org            53541 Morrow Cent~ Frasco Bowich  $34.89  
##  8 vhourihane7@ovh.net          4819 Hermina Park~ Car Ponten     ¥41.66  
##  9 rdike8@timesonline.co.uk     68096 Monument Pa~ Tades Checcuc~ €70,80  
## 10 tdudbridge9@clickbank.net    9595 Spaight Aven~ Wilton Kemmey  €62,76

Detect @