Agenda


In this module, we will learn to:

  • select
    • single column
    • multiple columns
    • distinct values in a column
  • limit the number of records returned
  • handle NULL values
  • and filter columns using the following operators
    • WHERE
    • AND, or & NOT
    • BETWEEN
    • IN
    • LIKE
  • aggregate data
  • order data
  • group data

What is SQL


  • stands for Structured Query Language
  • lets you access and manipulate databases

What can SQL do?


  • SQL can execute queries against a database
  • SQL can retrieve data from a database
  • SQL can insert records in a database
  • SQL can update records in a database
  • SQL can delete records from a database
  • SQL can create new databases
  • SQL can create new tables in a database

Data




## # A tibble: 1,000 x 6
##    device referrer purchase duration n_visit country       
##    <chr>  <chr>    <chr>       <dbl>   <int> <chr>         
##  1 laptop google   false         693      10 Czech Republic
##  2 tablet yahoo    false         459       9 Yemen         
##  3 laptop direct   false         996       0 Brazil        
##  4 tablet bing     true          468       3 China         
##  5 mobile yahoo    false         955       9 Poland        
##  6 laptop yahoo    false         135       5 South Africa  
##  7 mobile yahoo    false          75      10 Bangladesh    
##  8 mobile direct   false         908      10 Indonesia     
##  9 mobile bing     false         209       3 Netherlands   
## 10 mobile google   false         208       6 Czech Republic
## # ... with 990 more rows

Data Dictionary


  • id: row id
  • referrer: referrer website/search engine
  • device: device used to visit the website
  • duration: time spent on the website (in seconds)
  • country: country of origin
  • purchase: whether visitor purchased
  • n_visit: number of visits

Select Single Column


SELECT device FROM trade
Displaying records 1 - 6
device
laptop
tablet
laptop
tablet
mobile
laptop

Select Multiple Columns


Select the following fields from the trade table:

  • referrer
  • device
  • purchase

Select Multiple Columns


SELECT referrer, device, purchase  FROM trade
Displaying records 1 - 6
referrer device purchase
google laptop false
yahoo tablet false
direct laptop false
bing tablet true
yahoo mobile false
yahoo laptop false

Select All Columns


SELECT * FROM trade
Displaying records 1 - 6
device referrer purchase duration n_visit country
laptop google false 693 10 Czech Republic
tablet yahoo false 459 9 Yemen
laptop direct false 996 0 Brazil
tablet bing true 468 3 China
mobile yahoo false 955 9 Poland
laptop yahoo false 135 5 South Africa

Limit


SELECT * FROM trade limit 10
Displaying records 1 - 6
device referrer purchase duration n_visit country
laptop google false 693 10 Czech Republic
tablet yahoo false 459 9 Yemen
laptop direct false 996 0 Brazil
tablet bing true 468 3 China
mobile yahoo false 955 9 Poland
laptop yahoo false 135 5 South Africa

Distinct


SELECT distinct referrer FROM trade
5 records
referrer
google
yahoo
direct
bing
social

Filter


Now that we know how to select columns, let us focus on filtering data. In SQL, the WHERE keyword is used to extract only those records that fulfill a specified condition. Data filter based on both text and numeric values in a table. Below are a few comparison operators we can use:

  • =
  • <>
  • <
  • >
  • <=
  • >=

Filter


SELECT * FROM trade WHERE duration > 300
Displaying records 1 - 6
device referrer purchase duration n_visit country
laptop google false 693 10 Czech Republic
tablet yahoo false 459 9 Yemen
laptop direct false 996 0 Brazil
tablet bing true 468 3 China
mobile yahoo false 955 9 Poland
mobile direct false 908 10 Indonesia

Filter


SELECT * FROM trade WHERE device == 'mobile'
Displaying records 1 - 6
device referrer purchase duration n_visit country
mobile yahoo false 955 9 Poland
mobile yahoo false 75 10 Bangladesh
mobile direct false 908 10 Indonesia
mobile bing false 209 3 Netherlands
mobile google false 208 6 Czech Republic
mobile direct true 406 9 Ireland

And, Or & Not


The WHERE clause can be combined with other operators to filter data based on more than one condition or to create more complex conditions. Below are the operators:

  • AND - displays a record if all the conditions separated by AND is TRUE
  • OR - displays a record if any of the conditions separated by OR is TRUE
  • NOT - displays a record if the condition(s) is NOT TRUE

And, Or & Not


SELECT * FROM trade WHERE n_visit > 3 AND duration > 100
Displaying records 1 - 6
device referrer purchase duration n_visit country
laptop google false 693 10 Czech Republic
tablet yahoo false 459 9 Yemen
mobile yahoo false 955 9 Poland
laptop yahoo false 135 5 South Africa
mobile direct false 908 10 Indonesia
mobile google false 208 6 Czech Republic

And, Or & Not


In the next example, we will use both AND & OR. Our goal is to filter all rows from the trade table that follow the below conditions:

  • n_visit (visit count) is either equal to 3 or 5
  • device used to visit the website is either mobile or tablet

And, Or & Not


SELECT * FROM trade WHERE (n_visit == 5 OR n_visit == 3)  
  AND (device = 'mobile' OR device = 'tablet')
Displaying records 1 - 6
device referrer purchase duration n_visit country
tablet bing true 468 3 China
mobile bing false 209 3 Netherlands
tablet yahoo false 80 5 Philippines
tablet bing true 368 5 Peru
mobile google false 147 5 Brazil
mobile yahoo false 700 3 Canada

BETWEEN


The BETWEEN operator selects values within a given range and is inclusive: begin and end values are included. The values can be numbers, text, or dates. In the following example, we filter rows from the trade table where the visit count is between 1 and 3, and the device used to visit the website is mobile.

BETWEEN


SELECT * FROM trade WHERE n_visit BETWEEN 1 AND 3 AND device = 'mobile'
Displaying records 1 - 6
device referrer purchase duration n_visit country
mobile bing false 209 3 Netherlands
mobile direct false 501 2 El Salvador
mobile bing false 25 1 Ireland
mobile yahoo false 700 3 Canada
mobile direct false 234 1 Indonesia
mobile social false 44 1 France

IN


The IN operator allows us to specify multiple values in a WHERE clause. It is a shorthand for multiple OR conditions. In the below example, we filter rows from the trade table where the visit count is either 2 or 4 or 6 or 8 or 10. Instead of using multiple OR conditions, we use the IN operator.

IN


SELECT * FROM trade WHERE n_visit IN (2, 4, 6, 8, 10)
Displaying records 1 - 6
device referrer purchase duration n_visit country
laptop google false 693 10 Czech Republic
mobile yahoo false 75 10 Bangladesh
mobile direct false 908 10 Indonesia
mobile google false 208 6 Czech Republic
tablet direct false 132 6 Estonia
tablet social false 28 2 Namibia

IS NULL


A field with a NULL value is a field with no value. If a field in a table is optional, it is possible to insert a new record or update a record without adding a value to this field. Then, the field will be saved with a NULL value. In the next example, we filter all rows from the trade table where the device column has NULL values.

IS NULL


SELECT * FROM trade WHERE device IS NULL

Table: 0 records

device referrer purchase duration n_visit country ——- ——— ——— ——— ——– ——–

LIKE


The LIKE operator is used to search for a specific pattern in a column. There are two wildcards used in conjunction with the LIKE operator:

  • % : represents zero, one, or multiple characters
  • _ : represents a single character

LIKE


SELECT * FROM trade WHERE country LIKE 'P%'
Displaying records 1 - 6
device referrer purchase duration n_visit country
mobile yahoo false 955 9 Poland
tablet yahoo false 80 5 Philippines
tablet bing true 368 5 Peru
laptop bing false 456 0 Portugal
tablet yahoo false 706 9 Philippines
tablet direct false 814 8 Philippines

LIKE


Let us look at another example where we filter all rows from the trade table where the name of the country should follow the below conditions:

  • name can start with any character
  • the second character must be o
  • it can have any type or number of characters after the second character

LIKE


SELECT * FROM trade WHERE country LIKE '_o%'
Displaying records 1 - 6
device referrer purchase duration n_visit country
mobile yahoo false 955 9 Poland
laptop yahoo false 135 5 South Africa
tablet social true 290 7 Colombia
mobile yahoo false 225 8 Colombia
laptop direct false 658 8 Colombia
laptop bing false 456 0 Portugal

Aggregate


Let us combine the aggregate statements with WHERE statement to filter data. SUM() returns the total sum of a numeric column.

SELECT SUM(n_visit) FROM trade
1 records
SUM(n_visit)
4972


SELECT SUM(n_visit) FROM trade WHERE n_visit > 5
1 records
SUM(n_visit)
3574

Aggregate


AVG() returns the average value of a numeric column.

SELECT AVG(n_visit) FROM trade
1 records
AVG(n_visit)
4.972


SELECT AVG(n_visit) FROM trade
  WHERE country LIKE 'P%'
1 records
AVG(n_visit)
5.07913669064748

Aggregate


MAX() returns the largest value of the selected column.

SELECT MAX(n_visit) FROM trade
1 records
MAX(n_visit)
10


SELECT MAX(n_visit) FROM trade
  WHERE device == 'tablet'
1 records
MAX(n_visit)
10

Aggregate


MIN() returns the smallest value of the selected column.

SELECT MIN(n_visit) FROM trade
1 records
MIN(n_visit)
0


SELECT MIN(n_visit) FROM trade
  WHERE duration BETWEEN 600 AND 900
1 records
MIN(n_visit)
0

Alias


SQL aliases are used to give a table, or a column in a table, a temporary name. They are often used to make column names more readable. An alias only exists for the duration of the query.

Alias


SELECT AVG(n_visit) AS avg_mobile FROM trade
  WHERE device == 'mobile'
1 records
avg_mobile
5.4796511627907

Alias


SELECT MAX(n_visit) AS max_visit FROM trade
1 records
max_visit
10

Alias


SELECT MIN(duration) AS min_duration FROM trade
1 records
min_duration
10

Order By


The ORDER BY keyword is used to sort the records in ascending or descending order. By default, the records are sorted in ascending order. Use the DESC keyword if you want to sort the records in descending order,

SELECT * FROM trade ORDER BY country
Displaying records 1 - 6
device referrer purchase duration n_visit country
laptop social false 60 8 Afghanistan
laptop yahoo false 180 10 Afghanistan
laptop social false 274 2 Afghanistan
tablet direct false 682 10 Afghanistan
tablet direct true 414 5 Afghanistan
tablet bing false 34 7 Afghanistan

Order By


SELECT * FROM trade ORDER BY duration
Displaying records 1 - 6
device referrer purchase duration n_visit country
tablet yahoo false 10 5 Poland
laptop social false 10 1 Finland
laptop yahoo false 12 0 Indonesia
mobile social false 12 2 Botswana
laptop direct false 13 0 Czech Republic
mobile direct false 17 10 China

Order By


SELECT * FROM trade ORDER BY n_visit DESC
Displaying records 1 - 6
device referrer purchase duration n_visit country
laptop google false 693 10 Czech Republic
mobile yahoo false 75 10 Bangladesh
mobile direct false 908 10 Indonesia
mobile google false 338 10 Russia
mobile google false 479 10 Cameroon
mobile direct false 17 10 China

Group By


SELECT device, count(*) AS visits
  FROM trade
  GROUP BY device
  ORDER by visits DESC
3 records
device visits
mobile 344
tablet 331
laptop 325

Group By


SELECT device, MAX(duration) AS max_duration
  FROM trade
  GROUP BY device
  ORDER by max_duration DESC
3 records
device max_duration
tablet 999
laptop 997
mobile 994

Summary


In this module we learnt to

  • select
    • single column
    • multiple columns
    • distinct values in a column
  • limit the number of records returned
  • handle NULL values
  • and filter columns using the following operators
    • WHERE
    • AND, or & NOT
    • BETWEEN
    • IN
    • LIKE
  • aggregate data
  • order data
  • group data