Before we look at SQL syntax in detail, let's first get ourselves set up to run SQL queries in Jupyter.
1. Load the sql Module.
Load %%sql cell magic.
%load_ext sql
The sql extension is already loaded. To reload it, use: %reload_ext sql
2. Connect to a database.
Here, we connect to the SQLite database basic_examples.db.
%%sql
sqlite:///data/basic_examples.db
3. Run a simple SQL query.
Note the %%sql lets Jupyter parse the rest of the lines as a SQL command.
%%sql
SELECT * FROM Dragon;
* sqlite:///data/basic_examples.db Done.
| name | year | cute |
|---|---|---|
| hiccup | 2010 | 10 |
| drogon | 2011 | -100 |
| dragon 2 | 2019 | 0 |
| puff | 2010 | 100 |
| smaug | 2011 | None |
Simple query, this time on two different lines.
%%sql
SELECT *
FROM Dragon;
* sqlite:///data/basic_examples.db Done.
| name | year | cute |
|---|---|---|
| hiccup | 2010 | 10 |
| drogon | 2011 | -100 |
| dragon 2 | 2019 | 0 |
| puff | 2010 | 100 |
| smaug | 2011 | None |
pd.read_sql¶It turns out that pandas has a special-purpose function to parse SQL queries. We can pass in a SQL query as a string to return a pandas DataFrame. To achieve the same result as we did using cell magic above, we can do the following.
1. Connect to a database
import sqlalchemy
import pandas as pd
engine = sqlalchemy.create_engine("sqlite:///data/basic_examples.db")
connection = engine.connect()
2. Run a simple SQL query
query = """
SELECT *
FROM Dragon;
"""
pd.read_sql(query, engine)
| name | year | cute | |
|---|---|---|---|
| 0 | hiccup | 2010 | 10.0 |
| 1 | drogon | 2011 | -100.0 |
| 2 | dragon 2 | 2019 | 0.0 |
| 3 | puff | 2010 | 100.0 |
| 4 | smaug | 2011 | NaN |
A database contains a collection of SQL tables. Let's connect to our "toy" database basic_examples.db and explore the tables it stores.
%%sql
sqlite:///data/basic_examples.db
%%sql
SELECT * FROM sqlite_master WHERE type='table'
* sqlite:///data/basic_examples.db Done.
| type | name | tbl_name | rootpage | sql |
|---|---|---|---|---|
| table | sqlite_sequence | sqlite_sequence | 7 | CREATE TABLE sqlite_sequence(name,seq) |
| table | Dragon | Dragon | 2 | CREATE TABLE Dragon ( name TEXT PRIMARY KEY, year INTEGER CHECK (year >= 2000), cute INTEGER ) |
| table | Dish | Dish | 4 | CREATE TABLE Dish ( name TEXT PRIMARY KEY, type TEXT, cost INTEGER CHECK (cost >= 0) ) |
| table | Scene | Scene | 6 | CREATE TABLE Scene ( id INTEGER PRIMARY KEY AUTOINCREMENT, biome TEXT NOT NULL, city TEXT NOT NULL, visitors INTEGER CHECK (visitors >= 0), created_at DATETIME DEFAULT (DATETIME('now')) ) |
Every SQL query must contain a SELECT and FROM clause.
SELECT: specify the column(s) to return in the outputFROM: specify the database table from which to extract data%%sql
SELECT * FROM Dragon;
* sqlite:///data/basic_examples.db Done.
| name | year | cute |
|---|---|---|
| hiccup | 2010 | 10 |
| drogon | 2011 | -100 |
| dragon 2 | 2019 | 0 |
| puff | 2010 | 100 |
| smaug | 2011 | None |
%%sql
SELECT cute, year FROM Dragon;
* sqlite:///data/basic_examples.db Done.
| cute | year |
|---|---|
| 10 | 2010 |
| -100 | 2011 |
| 0 | 2019 |
| 100 | 2010 |
| None | 2011 |
Aliasing with AS
%%sql
SELECT cute AS cuteness,
year AS birth
FROM Dragon;
* sqlite:///data/basic_examples.db Done.
| cuteness | birth |
|---|---|
| 10 | 2010 |
| -100 | 2011 |
| 0 | 2019 |
| 100 | 2010 |
| None | 2011 |
Uniqueness with DISTINCT
%%sql
SELECT DISTINCT year
FROM Dragon;
* sqlite:///data/basic_examples.db Done.
| year |
|---|
| 2010 |
| 2011 |
| 2019 |
Filtering with WHERE
%%sql
SELECT name, year
FROM Dragon
WHERE cute > 0;
* sqlite:///data/basic_examples.db Done.
| name | year |
|---|---|
| hiccup | 2010 |
| puff | 2010 |
%%sql
SELECT name, cute, year
FROM Dragon
WHERE cute > 0 OR year > 2013;
* sqlite:///data/basic_examples.db Done.
| name | cute | year |
|---|---|---|
| hiccup | 10 | 2010 |
| dragon 2 | 0 | 2019 |
| puff | 100 | 2010 |
%%sql
SELECT name, year
FROM Dragon
WHERE name IN ("puff", "hiccup");
* sqlite:///data/basic_examples.db Done.
| name | year |
|---|---|
| hiccup | 2010 |
| puff | 2010 |
%%sql
SELECT name, cute
FROM Dragon
WHERE cute IS NOT NULL;
* sqlite:///data/basic_examples.db Done.
| name | cute |
|---|---|
| hiccup | 10 |
| drogon | -100 |
| dragon 2 | 0 |
| puff | 100 |
Ordering data using ORDER BY
%%sql
SELECT *
FROM Dragon
ORDER BY cute DESC;
* sqlite:///data/basic_examples.db Done.
| name | year | cute |
|---|---|---|
| puff | 2010 | 100 |
| hiccup | 2010 | 10 |
| dragon 2 | 2019 | 0 |
| drogon | 2011 | -100 |
| smaug | 2011 | None |
Restricting output with LIMIT and OFFSET
%%sql
SELECT *
FROM Dragon
LIMIT 2;
* sqlite:///data/basic_examples.db Done.
| name | year | cute |
|---|---|---|
| hiccup | 2010 | 10 |
| drogon | 2011 | -100 |
%%sql
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
* sqlite:///data/basic_examples.db Done.
| name | year | cute |
|---|---|---|
| drogon | 2011 | -100 |
| dragon 2 | 2019 | 0 |
GROUP BY¶%%sql
SELECT *
FROM Dish;
* sqlite:///data/basic_examples.db Done.
| name | type | cost |
|---|---|---|
| ravioli | entree | 10 |
| ramen | entree | 13 |
| taco | entree | 7 |
| edamame | appetizer | 4 |
| fries | appetizer | 4 |
| potsticker | appetizer | 4 |
| ice cream | dessert | 5 |
A small note: the fact that type is highlighted in green below is a consequence of Jupyter assuming that we are writing Python code (where type is a built-in keyword). type does not have a special meaning in SQL, so the color below does not indicate any special functionality. When we run the cell, Jupyter realizes it should recognize the code as SQL.
%%sql
SELECT type
FROM Dish;
* sqlite:///data/basic_examples.db Done.
| type |
|---|
| entree |
| entree |
| entree |
| appetizer |
| appetizer |
| appetizer |
| dessert |
%%sql
SELECT type
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
| type |
|---|
| appetizer |
| dessert |
| entree |
%%sql
SELECT type, SUM(cost)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
| type | SUM(cost) |
|---|---|
| appetizer | 12 |
| dessert | 5 |
| entree | 30 |
%%sql
SELECT type,
SUM(cost),
MIN(cost),
MAX(name)
FROM Dish
GROUP BY type;
* sqlite:///data/basic_examples.db Done.
| type | SUM(cost) | MIN(cost) | MAX(name) |
|---|---|---|---|
| appetizer | 12 | 4 | potsticker |
| dessert | 5 | 5 | ice cream |
| entree | 30 | 7 | taco |
%%sql
SELECT year, COUNT(cute)
FROM Dragon
GROUP BY year;
* sqlite:///data/basic_examples.db Done.
| year | COUNT(cute) |
|---|---|
| 2010 | 2 |
| 2011 | 1 |
| 2019 | 1 |
%%sql
SELECT year, COUNT(*)
FROM Dragon
GROUP BY year;
* sqlite:///data/basic_examples.db Done.
| year | COUNT(*) |
|---|---|
| 2010 | 2 |
| 2011 | 2 |
| 2019 | 1 |