Agenda
In this module, we will learn to:
- select
- single column
- multiple columns
- distinct values in a column
- limit the number of records returned
- handle NULL values
- and filter columns using the following operators
- WHERE
- AND, or & NOT
- BETWEEN
- IN
- LIKE
- aggregate data
- order data
- group data
What is SQL
- stands for Structured Query Language
- lets you access and manipulate databases
What can SQL do?
- SQL can execute queries against a database
- SQL can retrieve data from a database
- SQL can insert records in a database
- SQL can update records in a database
- SQL can delete records from a database
- SQL can create new databases
- SQL can create new tables in a database
Data
## # A tibble: 1,000 x 6
## device referrer purchase duration n_visit country
## <chr> <chr> <chr> <dbl> <int> <chr>
## 1 laptop google false 693 10 Czech Republic
## 2 tablet yahoo false 459 9 Yemen
## 3 laptop direct false 996 0 Brazil
## 4 tablet bing true 468 3 China
## 5 mobile yahoo false 955 9 Poland
## 6 laptop yahoo false 135 5 South Africa
## 7 mobile yahoo false 75 10 Bangladesh
## 8 mobile direct false 908 10 Indonesia
## 9 mobile bing false 209 3 Netherlands
## 10 mobile google false 208 6 Czech Republic
## # ... with 990 more rows
Data Dictionary
- id: row id
- referrer: referrer website/search engine
- device: device used to visit the website
- duration: time spent on the website (in seconds)
- country: country of origin
- purchase: whether visitor purchased
- n_visit: number of visits
Select Single Column
Displaying records 1 - 6
laptop |
tablet |
laptop |
tablet |
mobile |
laptop |
Select Multiple Columns
Select the following fields from the trade
table:
Select Multiple Columns
SELECT referrer, device, purchase FROM trade
Displaying records 1 - 6
google |
laptop |
false |
yahoo |
tablet |
false |
direct |
laptop |
false |
bing |
tablet |
true |
yahoo |
mobile |
false |
yahoo |
laptop |
false |
Select All Columns
Displaying records 1 - 6
laptop |
google |
false |
693 |
10 |
Czech Republic |
tablet |
yahoo |
false |
459 |
9 |
Yemen |
laptop |
direct |
false |
996 |
0 |
Brazil |
tablet |
bing |
true |
468 |
3 |
China |
mobile |
yahoo |
false |
955 |
9 |
Poland |
laptop |
yahoo |
false |
135 |
5 |
South Africa |
Limit
SELECT * FROM trade limit 10
Displaying records 1 - 6
laptop |
google |
false |
693 |
10 |
Czech Republic |
tablet |
yahoo |
false |
459 |
9 |
Yemen |
laptop |
direct |
false |
996 |
0 |
Brazil |
tablet |
bing |
true |
468 |
3 |
China |
mobile |
yahoo |
false |
955 |
9 |
Poland |
laptop |
yahoo |
false |
135 |
5 |
South Africa |
Distinct
SELECT distinct referrer FROM trade
5 records
google |
yahoo |
direct |
bing |
social |
Filter
Now that we know how to select columns, let us focus on filtering data. In SQL, the WHERE keyword is used to extract only those records that fulfill a specified condition. Data filter based on both text and numeric values in a table. Below are a few comparison operators we can use:
Filter
SELECT * FROM trade WHERE duration > 300
Displaying records 1 - 6
laptop |
google |
false |
693 |
10 |
Czech Republic |
tablet |
yahoo |
false |
459 |
9 |
Yemen |
laptop |
direct |
false |
996 |
0 |
Brazil |
tablet |
bing |
true |
468 |
3 |
China |
mobile |
yahoo |
false |
955 |
9 |
Poland |
mobile |
direct |
false |
908 |
10 |
Indonesia |
Filter
SELECT * FROM trade WHERE device == 'mobile'
Displaying records 1 - 6
mobile |
yahoo |
false |
955 |
9 |
Poland |
mobile |
yahoo |
false |
75 |
10 |
Bangladesh |
mobile |
direct |
false |
908 |
10 |
Indonesia |
mobile |
bing |
false |
209 |
3 |
Netherlands |
mobile |
google |
false |
208 |
6 |
Czech Republic |
mobile |
direct |
true |
406 |
9 |
Ireland |
And, Or & Not
The WHERE clause can be combined with other operators to filter data based on more than one condition or to create more complex conditions. Below are the operators:
- AND - displays a record if all the conditions separated by AND is TRUE
- OR - displays a record if any of the conditions separated by OR is TRUE
- NOT - displays a record if the condition(s) is NOT TRUE
And, Or & Not
SELECT * FROM trade WHERE n_visit > 3 AND duration > 100
Displaying records 1 - 6
laptop |
google |
false |
693 |
10 |
Czech Republic |
tablet |
yahoo |
false |
459 |
9 |
Yemen |
mobile |
yahoo |
false |
955 |
9 |
Poland |
laptop |
yahoo |
false |
135 |
5 |
South Africa |
mobile |
direct |
false |
908 |
10 |
Indonesia |
mobile |
google |
false |
208 |
6 |
Czech Republic |
And, Or & Not
In the next example, we will use both AND & OR. Our goal is to filter all rows from the trade
table that follow the below conditions:
n_visit
(visit count) is either equal to 3 or 5
device
used to visit the website is either mobile
or tablet
And, Or & Not
SELECT * FROM trade WHERE (n_visit == 5 OR n_visit == 3)
AND (device = 'mobile' OR device = 'tablet')
Displaying records 1 - 6
tablet |
bing |
true |
468 |
3 |
China |
mobile |
bing |
false |
209 |
3 |
Netherlands |
tablet |
yahoo |
false |
80 |
5 |
Philippines |
tablet |
bing |
true |
368 |
5 |
Peru |
mobile |
google |
false |
147 |
5 |
Brazil |
mobile |
yahoo |
false |
700 |
3 |
Canada |
BETWEEN
The BETWEEN operator selects values within a given range and is inclusive: begin and end values are included. The values can be numbers, text, or dates. In the following example, we filter rows from the trade
table where the visit count is between 1 and 3, and the device
used to visit the website is mobile
.
BETWEEN
SELECT * FROM trade WHERE n_visit BETWEEN 1 AND 3 AND device = 'mobile'
Displaying records 1 - 6
mobile |
bing |
false |
209 |
3 |
Netherlands |
mobile |
direct |
false |
501 |
2 |
El Salvador |
mobile |
bing |
false |
25 |
1 |
Ireland |
mobile |
yahoo |
false |
700 |
3 |
Canada |
mobile |
direct |
false |
234 |
1 |
Indonesia |
mobile |
social |
false |
44 |
1 |
France |
IN
The IN operator allows us to specify multiple values in a WHERE clause. It is a shorthand for multiple OR conditions. In the below example, we filter rows from the trade
table where the visit count is either 2 or 4 or 6 or 8 or 10. Instead of using multiple OR conditions, we use the IN operator.
IN
SELECT * FROM trade WHERE n_visit IN (2, 4, 6, 8, 10)
Displaying records 1 - 6
laptop |
google |
false |
693 |
10 |
Czech Republic |
mobile |
yahoo |
false |
75 |
10 |
Bangladesh |
mobile |
direct |
false |
908 |
10 |
Indonesia |
mobile |
google |
false |
208 |
6 |
Czech Republic |
tablet |
direct |
false |
132 |
6 |
Estonia |
tablet |
social |
false |
28 |
2 |
Namibia |
IS NULL
A field with a NULL value is a field with no value. If a field in a table is optional, it is possible to insert a new record or update a record without adding a value to this field. Then, the field will be saved with a NULL value. In the next example, we filter all rows from the trade
table where the device
column has NULL values.
IS NULL
SELECT * FROM trade WHERE device IS NULL
Table: 0 records
device referrer purchase duration n_visit country ——- ——— ——— ——— ——– ——–
LIKE
The LIKE operator is used to search for a specific pattern in a column. There are two wildcards used in conjunction with the LIKE operator:
%
: represents zero, one, or multiple characters
_
: represents a single character
LIKE
SELECT * FROM trade WHERE country LIKE 'P%'
Displaying records 1 - 6
mobile |
yahoo |
false |
955 |
9 |
Poland |
tablet |
yahoo |
false |
80 |
5 |
Philippines |
tablet |
bing |
true |
368 |
5 |
Peru |
laptop |
bing |
false |
456 |
0 |
Portugal |
tablet |
yahoo |
false |
706 |
9 |
Philippines |
tablet |
direct |
false |
814 |
8 |
Philippines |
LIKE
Let us look at another example where we filter all rows from the trade
table where the name of the country should follow the below conditions:
- name can start with any character
- the second character must be
o
- it can have any type or number of characters after the second character
LIKE
SELECT * FROM trade WHERE country LIKE '_o%'
Displaying records 1 - 6
mobile |
yahoo |
false |
955 |
9 |
Poland |
laptop |
yahoo |
false |
135 |
5 |
South Africa |
tablet |
social |
true |
290 |
7 |
Colombia |
mobile |
yahoo |
false |
225 |
8 |
Colombia |
laptop |
direct |
false |
658 |
8 |
Colombia |
laptop |
bing |
false |
456 |
0 |
Portugal |
Aggregate
Let us combine the aggregate statements with WHERE statement to filter data. SUM()
returns the total sum of a numeric column.
SELECT SUM(n_visit) FROM trade
SELECT SUM(n_visit) FROM trade WHERE n_visit > 5
Aggregate
AVG()
returns the average value of a numeric column.
SELECT AVG(n_visit) FROM trade
SELECT AVG(n_visit) FROM trade
WHERE country LIKE 'P%'
1 records
5.07913669064748 |
Aggregate
MAX()
returns the largest value of the selected column.
SELECT MAX(n_visit) FROM trade
SELECT MAX(n_visit) FROM trade
WHERE device == 'tablet'
Aggregate
MIN()
returns the smallest value of the selected column.
SELECT MIN(n_visit) FROM trade
SELECT MIN(n_visit) FROM trade
WHERE duration BETWEEN 600 AND 900
Alias
SQL aliases are used to give a table, or a column in a table, a temporary name. They are often used to make column names more readable. An alias only exists for the duration of the query.
Alias
SELECT AVG(n_visit) AS avg_mobile FROM trade
WHERE device == 'mobile'
1 records
5.4796511627907 |
Alias
SELECT MAX(n_visit) AS max_visit FROM trade
Alias
SELECT MIN(duration) AS min_duration FROM trade
Order By
The ORDER BY keyword is used to sort the records in ascending or descending order. By default, the records are sorted in ascending order. Use the DESC keyword if you want to sort the records in descending order,
SELECT * FROM trade ORDER BY country
Displaying records 1 - 6
laptop |
social |
false |
60 |
8 |
Afghanistan |
laptop |
yahoo |
false |
180 |
10 |
Afghanistan |
laptop |
social |
false |
274 |
2 |
Afghanistan |
tablet |
direct |
false |
682 |
10 |
Afghanistan |
tablet |
direct |
true |
414 |
5 |
Afghanistan |
tablet |
bing |
false |
34 |
7 |
Afghanistan |
Order By
SELECT * FROM trade ORDER BY duration
Displaying records 1 - 6
tablet |
yahoo |
false |
10 |
5 |
Poland |
laptop |
social |
false |
10 |
1 |
Finland |
laptop |
yahoo |
false |
12 |
0 |
Indonesia |
mobile |
social |
false |
12 |
2 |
Botswana |
laptop |
direct |
false |
13 |
0 |
Czech Republic |
mobile |
direct |
false |
17 |
10 |
China |
Order By
SELECT * FROM trade ORDER BY n_visit DESC
Displaying records 1 - 6
laptop |
google |
false |
693 |
10 |
Czech Republic |
mobile |
yahoo |
false |
75 |
10 |
Bangladesh |
mobile |
direct |
false |
908 |
10 |
Indonesia |
mobile |
google |
false |
338 |
10 |
Russia |
mobile |
google |
false |
479 |
10 |
Cameroon |
mobile |
direct |
false |
17 |
10 |
China |
Group By
SELECT device, count(*) AS visits
FROM trade
GROUP BY device
ORDER by visits DESC
3 records
mobile |
344 |
tablet |
331 |
laptop |
325 |
Group By
SELECT device, MAX(duration) AS max_duration
FROM trade
GROUP BY device
ORDER by max_duration DESC
3 records
tablet |
999 |
laptop |
997 |
mobile |
994 |
Summary
In this module we learnt to
- select
- single column
- multiple columns
- distinct values in a column
- limit the number of records returned
- handle NULL values
- and filter columns using the following operators
- WHERE
- AND, or & NOT
- BETWEEN
- IN
- LIKE
- aggregate data
- order data
- group data