SQL I¶

Introducing SQL and databases.

Acknowledgments Page

Starting Up SQL¶

Before we look at SQL syntax in detail, let's first get ourselves set up to run SQL queries in Jupyter.

Approach #1: SQL Magic¶

1. Load the sql Module.

Load %%sql cell magic.

In [2]:
%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql

2. Connect to a database.

Here, we connect to the SQLite database basic_examples.db.

In [3]:
%%sql
sqlite:///data/basic_examples.db

3. Run a simple SQL query.

Note the %%sql lets Jupyter parse the rest of the lines as a SQL command.

In [4]:
%%sql
SELECT * FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[4]:
name year cute
hiccup 2010 10
drogon 2011 -100
dragon 2 2019 0
puff 2010 100
smaug 2011 None

Simple query, this time on two different lines.

In [5]:
%%sql
SELECT *
FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[5]:
name year cute
hiccup 2010 10
drogon 2011 -100
dragon 2 2019 0
puff 2010 100
smaug 2011 None

Approach #2: pd.read_sql¶

It turns out that pandas has a special-purpose function to parse SQL queries. We can pass in a SQL query as a string to return a pandas DataFrame. To achieve the same result as we did using cell magic above, we can do the following.

1. Connect to a database

In [6]:
import sqlalchemy 
import pandas as pd

engine = sqlalchemy.create_engine("sqlite:///data/basic_examples.db")
connection = engine.connect()

2. Run a simple SQL query

In [7]:
query = """
SELECT * 
FROM Dragon;
"""

pd.read_sql(query, engine)
Out[7]:
name year cute
0 hiccup 2010 10.0
1 drogon 2011 -100.0
2 dragon 2 2019 0.0
3 puff 2010 100.0
4 smaug 2011 NaN

Tables and Schema¶

A database contains a collection of SQL tables. Let's connect to our "toy" database basic_examples.db and explore the tables it stores.

In [8]:
%%sql
sqlite:///data/basic_examples.db
In [9]:
%%sql
SELECT * FROM sqlite_master WHERE type='table'
 * sqlite:///data/basic_examples.db
Done.
Out[9]:
type name tbl_name rootpage sql
table sqlite_sequence sqlite_sequence 7 CREATE TABLE sqlite_sequence(name,seq)
table Dragon Dragon 2 CREATE TABLE Dragon (
name TEXT PRIMARY KEY,
year INTEGER CHECK (year >= 2000),
cute INTEGER
)
table Dish Dish 4 CREATE TABLE Dish (
name TEXT PRIMARY KEY,
type TEXT,
cost INTEGER CHECK (cost >= 0)
)
table Scene Scene 6 CREATE TABLE Scene (
id INTEGER PRIMARY KEY AUTOINCREMENT,
biome TEXT NOT NULL,
city TEXT NOT NULL,
visitors INTEGER CHECK (visitors >= 0),
created_at DATETIME DEFAULT (DATETIME('now'))
)

Basic Queries¶

Every SQL query must contain a SELECT and FROM clause.

  • SELECT: specify the column(s) to return in the output
  • FROM: specify the database table from which to extract data
In [10]:
%%sql
SELECT * FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[10]:
name year cute
hiccup 2010 10
drogon 2011 -100
dragon 2 2019 0
puff 2010 100
smaug 2011 None
In [11]:
%%sql
SELECT cute, year FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[11]:
cute year
10 2010
-100 2011
0 2019
100 2010
None 2011

Aliasing with AS

In [12]:
%%sql
SELECT cute AS cuteness,
       year AS birth
FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[12]:
cuteness birth
10 2010
-100 2011
0 2019
100 2010
None 2011

Uniqueness with DISTINCT

In [13]:
%%sql
SELECT DISTINCT year
FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[13]:
year
2010
2011
2019

Filtering with WHERE

In [14]:
%%sql
SELECT name, year
FROM Dragon
WHERE cute > 0;
 * sqlite:///data/basic_examples.db
Done.
Out[14]:
name year
hiccup 2010
puff 2010
In [15]:
%%sql
SELECT name, cute, year
FROM Dragon
WHERE cute > 0 OR year > 2013;
 * sqlite:///data/basic_examples.db
Done.
Out[15]:
name cute year
hiccup 10 2010
dragon 2 0 2019
puff 100 2010
In [16]:
%%sql
SELECT name, year
FROM Dragon 
WHERE name IN ("puff", "hiccup");
 * sqlite:///data/basic_examples.db
Done.
Out[16]:
name year
hiccup 2010
puff 2010
In [17]:
%%sql
SELECT name, cute
FROM Dragon
WHERE cute IS NOT NULL;
 * sqlite:///data/basic_examples.db
Done.
Out[17]:
name cute
hiccup 10
drogon -100
dragon 2 0
puff 100

Ordering data using ORDER BY

In [18]:
%%sql
SELECT *
FROM Dragon
ORDER BY cute DESC;
 * sqlite:///data/basic_examples.db
Done.
Out[18]:
name year cute
puff 2010 100
hiccup 2010 10
dragon 2 2019 0
drogon 2011 -100
smaug 2011 None

Restricting output with LIMIT and OFFSET

In [19]:
%%sql
SELECT *
FROM Dragon
LIMIT 2;
 * sqlite:///data/basic_examples.db
Done.
Out[19]:
name year cute
hiccup 2010 10
drogon 2011 -100
In [20]:
%%sql
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
 * sqlite:///data/basic_examples.db
Done.
Out[20]:
name year cute
drogon 2011 -100
dragon 2 2019 0

Grouping Data with GROUP BY¶

In [21]:
%%sql
SELECT *
FROM Dish;
 * sqlite:///data/basic_examples.db
Done.
Out[21]:
name type cost
ravioli entree 10
ramen entree 13
taco entree 7
edamame appetizer 4
fries appetizer 4
potsticker appetizer 4
ice cream dessert 5

A small note: the fact that type is highlighted in green below is a consequence of Jupyter assuming that we are writing Python code (where type is a built-in keyword). type does not have a special meaning in SQL, so the color below does not indicate any special functionality. When we run the cell, Jupyter realizes it should recognize the code as SQL.

In [22]:
%%sql
SELECT type
FROM Dish;
 * sqlite:///data/basic_examples.db
Done.
Out[22]:
type
entree
entree
entree
appetizer
appetizer
appetizer
dessert
In [23]:
%%sql
SELECT type
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[23]:
type
appetizer
dessert
entree
In [24]:
%%sql
SELECT type, SUM(cost)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[24]:
type SUM(cost)
appetizer 12
dessert 5
entree 30
In [25]:
%%sql
SELECT type, 
       SUM(cost), 
       MIN(cost),
       MAX(name)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[25]:
type SUM(cost) MIN(cost) MAX(name)
appetizer 12 4 potsticker
dessert 5 5 ice cream
entree 30 7 taco
In [26]:
%%sql
SELECT year, COUNT(cute)
FROM Dragon
GROUP BY year;
 * sqlite:///data/basic_examples.db
Done.
Out[26]:
year COUNT(cute)
2010 2
2011 1
2019 1
In [27]:
%%sql
SELECT year, COUNT(*)
FROM Dragon
GROUP BY year;
 * sqlite:///data/basic_examples.db
Done.
Out[27]:
year COUNT(*)
2010 2
2011 2
2019 1