Before we look at SQL syntax in detail, let's first see whether we can understand the rough idea of SQL.
1. Load the sql
Module.. Load %%sql
cell magic.
%load_ext sql
2. Connect to a Database. SQLite database basic_examples.db
:
%%sql
sqlite:///data/basic_examples.db
3. Run a simple SQL query. Note the %%sql
lets Jupyter parse the rest of the lines as a SQL command.
%%sql
SELECT * FROM Dragon;
* sqlite:///data/basic_examples.db Done.
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
Simple query, this time on two different lines.
%%sql
SELECT *
FROM Dragon;
* sqlite:///data/basic_examples.db Done.
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
Another simple query.
%%sql
SELECT *
FROM Dish;
* sqlite:///data/basic_examples.db Done.
name | type | cost |
---|---|---|
ravioli | entree | 10 |
ramen | entree | 13 |
taco | entree | 7 |
edamame | appetizer | 4 |
fries | appetizer | 4 |
potsticker | appetizer | 4 |
ice cream | dessert | 5 |
The last SQL query in today's lecture.
(Note: type
is not a keyword here; it is a column name. But Jupyter is using Python keyword highlighting, hence the green.)
%%sql
SELECT type, AVG(DISTINCT cost)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type | AVG(DISTINCT cost) |
---|---|
appetizer | 4.0 |
dessert | 5.0 |
entree | 10.0 |
Let's try to get all the way to this query by the end of today!
SQL inside a Python-based Jupyter Notebook:
%load_ext sql
The sql extension is already loaded. To reload it, use: %reload_ext sql
# We'll also connect directly to our DB with Pandas later
import sqlalchemy
import pandas as pd
Then we will connect to the database. In this lecture example, the database is stored in a single file on our own computer called lec18_basic_examples.db.
Note that starting a cell with %%sql
tells Jupyter that you are running SQL code, not Python code.
In actual practice, you'd usually connect to some database server via a network connection, e.g. hosted on some computer on the internet.
%%sql
sqlite:///data/basic_examples.db
Now that we're connected, we can, for example, display the contents of the Dragon table.
%%sql
SELECT * FROM Dragon;
* sqlite:///data/basic_examples.db Done.
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
%%sql
SELECT * FROM sqlite_master WHERE type='table'
* sqlite:///data/basic_examples.db Done.
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | sqlite_sequence | sqlite_sequence | 7 | CREATE TABLE sqlite_sequence(name,seq) |
table | Dragon | Dragon | 2 | CREATE TABLE Dragon ( name TEXT PRIMARY KEY, year INTEGER CHECK (year >= 2000), cute INTEGER ) |
table | Dish | Dish | 4 | CREATE TABLE Dish ( name TEXT PRIMARY KEY, type TEXT, cost INTEGER CHECK (cost >= 0) ) |
table | Scene | Scene | 6 | CREATE TABLE Scene ( id INTEGER PRIMARY KEY AUTOINCREMENT, biome TEXT NOT NULL, city TEXT NOT NULL, visitors INTEGER CHECK (visitors >= 0), created_at DATETIME DEFAULT (DATETIME('now')) ) |
%%sql
SELECT * FROM Dragon;
* sqlite:///data/basic_examples.db Done.
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
%%sql
SELECT cute, year FROM Dragon;
* sqlite:///data/basic_examples.db Done.
cute | year |
---|---|
10 | 2010 |
-100 | 2011 |
0 | 2019 |
%%sql
SELECT cute AS cuteness,
year AS birth
FROM Dragon;
* sqlite:///data/basic_examples.db Done.
cuteness | birth |
---|---|
10 | 2010 |
-100 | 2011 |
0 | 2019 |
%%sql
SELECT cute AS cuteness,
year AS birth
FROM Dragon;
* sqlite:///data/basic_examples.db Done.
cuteness | birth |
---|---|
10 | 2010 |
-100 | 2011 |
0 | 2019 |
%%sql
SELECT name, year
FROM Dragon
WHERE cute > 0;
* sqlite:///data/basic_examples.db Done.
name | year |
---|---|
hiccup | 2010 |
%%sql
SELECT name, cute, year
FROM Dragon
WHERE cute > 0 OR year > 2015;
* sqlite:///data/basic_examples.db Done.
name | cute | year |
---|---|---|
hiccup | 10 | 2010 |
dragon 2 | 0 | 2019 |
%%sql
SELECT *
FROM Dragon
ORDER BY cute DESC;
* sqlite:///data/basic_examples.db Done.
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
dragon 2 | 2019 | 0 |
drogon | 2011 | -100 |
%%sql
SELECT *
FROM Dragon
LIMIT 2;
* sqlite:///data/basic_examples.db Done.
name | year | cute |
---|---|---|
hiccup | 2010 | 10 |
drogon | 2011 | -100 |
%%sql
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
* sqlite:///data/basic_examples.db Done.
name | year | cute |
---|---|---|
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
%%sql
SELECT *
FROM Dish;
* sqlite:///data/basic_examples.db Done.
name | type | cost |
---|---|---|
ravioli | entree | 10 |
ramen | entree | 13 |
taco | entree | 7 |
edamame | appetizer | 4 |
fries | appetizer | 4 |
potsticker | appetizer | 4 |
ice cream | dessert | 5 |
%%sql
SELECT type
FROM Dish;
* sqlite:///data/basic_examples.db Done.
type |
---|
entree |
entree |
entree |
appetizer |
appetizer |
appetizer |
dessert |
%%sql
SELECT type
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type |
---|
appetizer |
dessert |
entree |
%%sql
SELECT type, MAX(cost)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type | MAX(cost) |
---|---|
appetizer | 4 |
dessert | 5 |
entree | 13 |
%%sql
SELECT type,
SUM(cost),
MIN(cost),
MAX(name)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type | SUM(cost) | MIN(cost) | MAX(name) |
---|---|---|---|
appetizer | 12 | 4 | potsticker |
dessert | 5 | 5 | ice cream |
entree | 30 | 7 | taco |
%%sql
SELECT type, COUNT(cost)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type | COUNT(cost) |
---|---|
appetizer | 3 |
dessert | 1 |
entree | 3 |
%%sql
SELECT type, COUNT(cost)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type | COUNT(cost) |
---|---|
appetizer | 3 |
dessert | 1 |
entree | 3 |
%%sql
SELECT type, COUNT(*)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type | COUNT(*) |
---|---|
appetizer | 3 |
dessert | 1 |
entree | 3 |
%%sql
SELECT type, cost
FROM Dish
GROUP BY type, cost;
* sqlite:///data/basic_examples.db Done.
type | cost |
---|---|
appetizer | 4 |
dessert | 5 |
entree | 7 |
entree | 10 |
entree | 13 |
%load_ext sql
The sql extension is already loaded. To reload it, use: %reload_ext sql
%%sql
sqlite:///data/basic_examples.db
%%sql
-- A --
SELECT type, cost
FROM Dish
GROUP BY type, cost, COUNT(*);
* sqlite:///data/basic_examples.db (sqlite3.OperationalError) aggregate functions are not allowed in the GROUP BY clause [SQL: -- A -- SELECT type, cost FROM Dish GROUP BY type, cost, COUNT(*);] (Background on this error at: https://sqlalche.me/e/14/e3q8)
%%sql
-- B --
SELECT type, cost, COUNT(*)
FROM Dish
GROUP BY type, cost;
* sqlite:///data/basic_examples.db Done.
type | cost | COUNT(*) |
---|---|---|
appetizer | 4 | 3 |
dessert | 5 | 1 |
entree | 7 | 1 |
entree | 10 | 1 |
entree | 13 | 1 |
%%sql
SELECT type, cost, COUNT(*) as size
FROM Dish
GROUP BY type, cost;
* sqlite:///data/basic_examples.db Done.
type | cost | size |
---|---|---|
appetizer | 4 | 3 |
dessert | 5 | 1 |
entree | 7 | 1 |
entree | 10 | 1 |
entree | 13 | 1 |
Remember you can rename columns if you want, so instead of seeing a column named COUNT(*)
, we get something more descriptive:
%%sql
SELECT type, cost, COUNT(*) as size
FROM Dish
GROUP BY type, cost;
* sqlite:///data/basic_examples.db Done.
type | cost | size |
---|---|---|
appetizer | 4 | 3 |
dessert | 5 | 1 |
entree | 7 | 1 |
entree | 10 | 1 |
entree | 13 | 1 |
%%sql
SELECT type, COUNT(*)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type | COUNT(*) |
---|---|
appetizer | 3 |
dessert | 1 |
entree | 3 |
%%sql
SELECT type, COUNT(*)
FROM Dish
GROUP BY type
HAVING MAX(cost) < 8
* sqlite:///data/basic_examples.db Done.
type | COUNT(*) |
---|---|
appetizer | 3 |
dessert | 1 |
%%sql
SELECT type, COUNT(*)
FROM Dish
WHERE cost < 8
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type | COUNT(*) |
---|---|
appetizer | 3 |
dessert | 1 |
entree | 1 |
%%sql
SELECT DISTINCT cost
FROM Dish;
* sqlite:///data/basic_examples.db Done.
cost |
---|
10 |
13 |
7 |
4 |
5 |
%%sql
SELECT DISTINCT type
FROM Dish
WHERE cost < 11;
* sqlite:///data/basic_examples.db Done.
type |
---|
entree |
appetizer |
dessert |
%%sql
SELECT DISTINCT type, cost
FROM Dish
WHERE cost < 11;
* sqlite:///data/basic_examples.db Done.
type | cost |
---|---|
entree | 10 |
entree | 7 |
appetizer | 4 |
dessert | 5 |
%%sql
SELECT DISTINCT type, cost
FROM Dish;
* sqlite:///data/basic_examples.db Done.
type | cost |
---|---|
entree | 10 |
entree | 13 |
entree | 7 |
appetizer | 4 |
dessert | 5 |
%%sql
SELECT type, cost
FROM Dish
GROUP BY type, cost;
* sqlite:///data/basic_examples.db Done.
type | cost |
---|---|
appetizer | 4 |
dessert | 5 |
entree | 7 |
entree | 10 |
entree | 13 |
%%sql
SELECT type, AVG(cost)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type | AVG(cost) |
---|---|
appetizer | 4.0 |
dessert | 5.0 |
entree | 10.0 |
%%sql
SELECT type, AVG(DISTINCT cost)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
type | AVG(DISTINCT cost) |
---|---|
appetizer | 4.0 |
dessert | 5.0 |
entree | 10.0 |
%load_ext sql
The sql extension is already loaded. To reload it, use: %reload_ext sql
# We'll also connect directly to our DB with Pandas later
import sqlalchemy
import pandas as pd
%%sql
sqlite:///data/basic_examples.db
Using the syntax
%%sql PYTHON_VARIABLE <<
BODY OF QUERY
lets us capture the result of the SQL statement into a Python variable. The docs provide more details.
For example, let's run the same query above but storing the output into a variable named drag2
:
%%sql drag2 <<
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
* sqlite:///data/basic_examples.db Done. Returning data to local variable drag2
We can now see this variable and what type it is. While it looks a lot like a data frame, it's not one:
drag2
name | year | cute |
---|---|---|
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
type(drag2)
sql.run.ResultSet
First, the above SQL results can be easily converted into true Pandas DataFrames, by calling their DataFrame
method:
drag2.DataFrame()
name | year | cute | |
---|---|---|---|
0 | drogon | 2011 | -100 |
1 | dragon 2 | 2019 | 0 |
But we can also use Pandas to connect to the same database, in pure Python. Sometimes this may be more convenient than using the SQL magic syntax - in the homework you'll practice some of this as well. You should be familiar with both methods, and use the one that best fits your needs.
engine = sqlalchemy.create_engine("sqlite:///data/lec18_basic_examples.db")
connection = engine.connect()
query = """
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
"""
As you can see, both the direct pandas query, and converting drag2
to a DataFrame, produce identical results (as they should!).
Finally, now that we have the query
variable with the contents of our query, we can use it directly in the SQL magic too! It supports {var}
syntax (as well as $var
and will expand the python var
variable into its value before making the query:
query = """
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
"""
%%sql
$query
* sqlite:///data/basic_examples.db Done.
name | year | cute |
---|---|---|
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |
%%sql
$query
* sqlite:///data/basic_examples.db Done.
name | year | cute |
---|---|---|
drogon | 2011 | -100 |
dragon 2 | 2019 | 0 |