Lecture 21: Introduction to SQL¶

Data 100, Spring 2023

Acknowledgments Page

SQL Demo¶

Before we look at SQL syntax in detail, let's first see whether we can understand the rough idea of SQL.

1. Load the sql Module.. Load %%sql cell magic.

In [1]:
%load_ext sql

2. Connect to a Database. SQLite database basic_examples.db:

In [2]:
%%sql
sqlite:///data/basic_examples.db


3. Run a simple SQL query. Note the %%sql lets Jupyter parse the rest of the lines as a SQL command.

In [3]:
%%sql
SELECT * FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[3]:
name year cute
hiccup 2010 10
drogon 2011 -100
dragon 2 2019 0

Simple query, this time on two different lines.

In [4]:
%%sql
SELECT *
FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[4]:
name year cute
hiccup 2010 10
drogon 2011 -100
dragon 2 2019 0


Another simple query.

In [5]:
%%sql
SELECT *
FROM Dish;
 * sqlite:///data/basic_examples.db
Done.
Out[5]:
name type cost
ravioli entree 10
ramen entree 13
taco entree 7
edamame appetizer 4
fries appetizer 4
potsticker appetizer 4
ice cream dessert 5


The last SQL query in today's lecture.

  • How much can you understand right now?
  • Can you get a sense of how the SQL query is written?

(Note: type is not a keyword here; it is a column name. But Jupyter is using Python keyword highlighting, hence the green.)

In [6]:
%%sql
SELECT type, AVG(DISTINCT cost)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[6]:
type AVG(DISTINCT cost)
appetizer 4.0
dessert 5.0
entree 10.0

Let's try to get all the way to this query by the end of today!

Startup (once more)¶

SQL inside a Python-based Jupyter Notebook:

  • This Notebook focuses on SQL, while using the Python-based IPython kernel.
  • For this, we take advantage of the extensible IPython magic command system, as shown below.
In [7]:
%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql
In [8]:
# We'll also connect directly to our DB with Pandas later
import sqlalchemy
import pandas as pd

Connection and basic queries¶

Then we will connect to the database. In this lecture example, the database is stored in a single file on our own computer called lec18_basic_examples.db.

Note that starting a cell with %%sql tells Jupyter that you are running SQL code, not Python code.

In actual practice, you'd usually connect to some database server via a network connection, e.g. hosted on some computer on the internet.

In [9]:
%%sql
sqlite:///data/basic_examples.db

Now that we're connected, we can, for example, display the contents of the Dragon table.

In [10]:
%%sql
SELECT * FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[10]:
name year cute
hiccup 2010 10
drogon 2011 -100
dragon 2 2019 0
In [11]:
%%sql
SELECT * FROM sqlite_master WHERE type='table'
 * sqlite:///data/basic_examples.db
Done.
Out[11]:
type name tbl_name rootpage sql
table sqlite_sequence sqlite_sequence 7 CREATE TABLE sqlite_sequence(name,seq)
table Dragon Dragon 2 CREATE TABLE Dragon (
name TEXT PRIMARY KEY,
year INTEGER CHECK (year >= 2000),
cute INTEGER
)
table Dish Dish 4 CREATE TABLE Dish (
name TEXT PRIMARY KEY,
type TEXT,
cost INTEGER CHECK (cost >= 0)
)
table Scene Scene 6 CREATE TABLE Scene (
id INTEGER PRIMARY KEY AUTOINCREMENT,
biome TEXT NOT NULL,
city TEXT NOT NULL,
visitors INTEGER CHECK (visitors >= 0),
created_at DATETIME DEFAULT (DATETIME('now'))
)
In [12]:
%%sql
SELECT * FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[12]:
name year cute
hiccup 2010 10
drogon 2011 -100
dragon 2 2019 0
In [13]:
%%sql
SELECT cute, year FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[13]:
cute year
10 2010
-100 2011
0 2019
In [14]:
%%sql
SELECT cute AS cuteness,
       year AS birth
FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[14]:
cuteness birth
10 2010
-100 2011
0 2019
In [15]:
%%sql
SELECT cute AS cuteness,
       year AS birth 
FROM Dragon;
 * sqlite:///data/basic_examples.db
Done.
Out[15]:
cuteness birth
10 2010
-100 2011
0 2019
In [16]:
%%sql
SELECT name, year
FROM Dragon
WHERE cute > 0;
 * sqlite:///data/basic_examples.db
Done.
Out[16]:
name year
hiccup 2010
In [17]:
%%sql
SELECT name, cute, year
FROM Dragon
WHERE cute > 0 OR year > 2015;
 * sqlite:///data/basic_examples.db
Done.
Out[17]:
name cute year
hiccup 10 2010
dragon 2 0 2019
In [18]:
%%sql
SELECT *
FROM Dragon
ORDER BY cute DESC;
 * sqlite:///data/basic_examples.db
Done.
Out[18]:
name year cute
hiccup 2010 10
dragon 2 2019 0
drogon 2011 -100
In [19]:
%%sql
SELECT *
FROM Dragon
LIMIT 2;
 * sqlite:///data/basic_examples.db
Done.
Out[19]:
name year cute
hiccup 2010 10
drogon 2011 -100
In [20]:
%%sql
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
 * sqlite:///data/basic_examples.db
Done.
Out[20]:
name year cute
drogon 2011 -100
dragon 2 2019 0

GROUP BY operations¶

In [21]:
%%sql
SELECT *
FROM Dish;
 * sqlite:///data/basic_examples.db
Done.
Out[21]:
name type cost
ravioli entree 10
ramen entree 13
taco entree 7
edamame appetizer 4
fries appetizer 4
potsticker appetizer 4
ice cream dessert 5
In [22]:
%%sql
SELECT type
FROM Dish;
 * sqlite:///data/basic_examples.db
Done.
Out[22]:
type
entree
entree
entree
appetizer
appetizer
appetizer
dessert
In [23]:
%%sql
SELECT type
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[23]:
type
appetizer
dessert
entree
In [24]:
%%sql
SELECT type, MAX(cost)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[24]:
type MAX(cost)
appetizer 4
dessert 5
entree 13
In [25]:
%%sql
SELECT type, 
       SUM(cost), 
       MIN(cost),
       MAX(name)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[25]:
type SUM(cost) MIN(cost) MAX(name)
appetizer 12 4 potsticker
dessert 5 5 ice cream
entree 30 7 taco

Trickier GROUP BY¶

In [26]:
%%sql
SELECT type, COUNT(cost)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[26]:
type COUNT(cost)
appetizer 3
dessert 1
entree 3
In [27]:
%%sql
SELECT type, COUNT(cost)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[27]:
type COUNT(cost)
appetizer 3
dessert 1
entree 3
In [28]:
%%sql
SELECT type, COUNT(*)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[28]:
type COUNT(*)
appetizer 3
dessert 1
entree 3
In [29]:
%%sql
SELECT type, cost
FROM Dish
GROUP BY type, cost;
 * sqlite:///data/basic_examples.db
Done.
Out[29]:
type cost
appetizer 4
dessert 5
entree 7
entree 10
entree 13

Exercise: GROUP BY Multiple Columns¶

In [30]:
%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql
In [31]:
%%sql
sqlite:///data/basic_examples.db
In [32]:
%%sql
-- A --
SELECT type, cost
FROM Dish
GROUP BY type, cost, COUNT(*);
 * sqlite:///data/basic_examples.db
(sqlite3.OperationalError) aggregate functions are not allowed in the GROUP BY clause
[SQL: -- A --
SELECT type, cost
FROM Dish
GROUP BY type, cost, COUNT(*);]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
In [33]:
%%sql
-- B --
SELECT type, cost, COUNT(*)
FROM Dish
GROUP BY type, cost;
 * sqlite:///data/basic_examples.db
Done.
Out[33]:
type cost COUNT(*)
appetizer 4 3
dessert 5 1
entree 7 1
entree 10 1
entree 13 1
In [34]:
%%sql
SELECT type, cost, COUNT(*) as size
FROM Dish
GROUP BY type, cost;
 * sqlite:///data/basic_examples.db
Done.
Out[34]:
type cost size
appetizer 4 3
dessert 5 1
entree 7 1
entree 10 1
entree 13 1

Remember you can rename columns if you want, so instead of seeing a column named COUNT(*), we get something more descriptive:

In [35]:
%%sql
SELECT type, cost, COUNT(*) as size
FROM Dish
GROUP BY type, cost;
 * sqlite:///data/basic_examples.db
Done.
Out[35]:
type cost size
appetizer 4 3
dessert 5 1
entree 7 1
entree 10 1
entree 13 1
In [36]:
%%sql
SELECT type, COUNT(*)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[36]:
type COUNT(*)
appetizer 3
dessert 1
entree 3
In [37]:
%%sql
SELECT type, COUNT(*)
FROM Dish
GROUP BY type
HAVING MAX(cost) < 8
 * sqlite:///data/basic_examples.db
Done.
Out[37]:
type COUNT(*)
appetizer 3
dessert 1
In [38]:
%%sql
SELECT type, COUNT(*)
FROM Dish
WHERE cost < 8
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[38]:
type COUNT(*)
appetizer 3
dessert 1
entree 1

DISTINCT¶

In [39]:
%%sql
SELECT DISTINCT cost
FROM Dish;
 * sqlite:///data/basic_examples.db
Done.
Out[39]:
cost
10
13
7
4
5
In [40]:
%%sql
SELECT DISTINCT type 
FROM Dish
WHERE cost < 11;
 * sqlite:///data/basic_examples.db
Done.
Out[40]:
type
entree
appetizer
dessert
In [41]:
%%sql
SELECT DISTINCT type, cost 
FROM Dish
WHERE cost < 11;
 * sqlite:///data/basic_examples.db
Done.
Out[41]:
type cost
entree 10
entree 7
appetizer 4
dessert 5
In [42]:
%%sql
SELECT DISTINCT type, cost 
FROM Dish;
 * sqlite:///data/basic_examples.db
Done.
Out[42]:
type cost
entree 10
entree 13
entree 7
appetizer 4
dessert 5
In [43]:
%%sql
SELECT type, cost
FROM Dish
GROUP BY type, cost;
 * sqlite:///data/basic_examples.db
Done.
Out[43]:
type cost
appetizer 4
dessert 5
entree 7
entree 10
entree 13
In [44]:
%%sql
SELECT type, AVG(cost)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[44]:
type AVG(cost)
appetizer 4.0
dessert 5.0
entree 10.0
In [45]:
%%sql
SELECT type, AVG(DISTINCT cost)
FROM Dish
GROUP BY type;
 * sqlite:///data/basic_examples.db
Done.
Out[45]:
type AVG(DISTINCT cost)
appetizer 4.0
dessert 5.0
entree 10.0

Python-SQL combinations¶

In [46]:
%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql
In [47]:
# We'll also connect directly to our DB with Pandas later
import sqlalchemy
import pandas as pd
In [48]:
%%sql
sqlite:///data/basic_examples.db

Using the syntax

%%sql PYTHON_VARIABLE <<
BODY OF QUERY

lets us capture the result of the SQL statement into a Python variable. The docs provide more details.

For example, let's run the same query above but storing the output into a variable named drag2:

In [49]:
%%sql drag2 << 
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
 * sqlite:///data/basic_examples.db
Done.
Returning data to local variable drag2

We can now see this variable and what type it is. While it looks a lot like a data frame, it's not one:

In [50]:
drag2
Out[50]:
name year cute
drogon 2011 -100
dragon 2 2019 0
In [51]:
type(drag2)
Out[51]:
sql.run.ResultSet

Pandas-SQL interplay¶

First, the above SQL results can be easily converted into true Pandas DataFrames, by calling their DataFrame method:

In [52]:
drag2.DataFrame()
Out[52]:
name year cute
0 drogon 2011 -100
1 dragon 2 2019 0

But we can also use Pandas to connect to the same database, in pure Python. Sometimes this may be more convenient than using the SQL magic syntax - in the homework you'll practice some of this as well. You should be familiar with both methods, and use the one that best fits your needs.

In [53]:
engine = sqlalchemy.create_engine("sqlite:///data/lec18_basic_examples.db")
connection = engine.connect()
In [54]:
query = """
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
"""

As you can see, both the direct pandas query, and converting drag2 to a DataFrame, produce identical results (as they should!).

Finally, now that we have the query variable with the contents of our query, we can use it directly in the SQL magic too! It supports {var} syntax (as well as $var and will expand the python var variable into its value before making the query:

In [55]:
query = """
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
"""
In [56]:
%%sql
$query
 * sqlite:///data/basic_examples.db
Done.
Out[56]:
name year cute
drogon 2011 -100
dragon 2 2019 0
In [57]:
%%sql
$query
 * sqlite:///data/basic_examples.db
Done.
Out[57]:
name year cute
drogon 2011 -100
dragon 2 2019 0