SQL Common Errors
Common Misconceptions
How to read SQL Errors
Because Jupyter notebooks run Python natively, we need to import extra modules like sql
, sqlalchemy
, and duckdb
in order to run SQL queries within our notebook. This results in slightly more complex error messages than normal Python. Refer to the image below if you need help identifying the most important parts (the red and blue text).
Sometimes, SQL errors may show up as a long string without any red highlight, for example:
(duckdb.duckdb.BinderException) Binder Error: Referenced column "some_column" not found in FROM clause! LINE 1: SELECT some_column ^
...
Note that this is the same error as the one above – it’s just not formatted nicely. Try your best to parse it; you can even copy the error into a new cell and manually indent sentences if it helps with readability!
Filtering with multiple conditions
When you are using more than one condition in filtering clauses like WHERE
or HAVING
, it’s important consider that the ordering of OR
s and AND
s will affect your SQL output.
In SQL
, the operator precedence typically evaluates AND
operations before OR
operations, and so incorrectly ordered conditions can lead to unexpected results. To ensure your conditions are evaluated in the intended order, use parentheses to group them explicitly.
For example, consider the following query condition:
WHERE A > 500 AND B > 200 OR C < 50
By default, is will be evaluated as
WHERE (A > 500 AND B > 200) OR C < 50
because AND
has a higher precedence than OR
. To evaluate it differently, add parentheses to specify the order of operations:
WHERE A > 500 AND (B > 200 OR C < 50)
Truncated Display Limit
Our Jupyter Notebooks are set to only display the first 10 rows of a table. Otherwise, your notebook would likely crash with the weight of displaying >1000 rows. You’ll likely see a
Truncated to displaylimit of 10
printed right below your SQL query if your table contains more than 10 rows.
Common Errors
NameError: name ‘[res_q#]’ is not defined
This occurs when you have an error in your SQL code, causing the cell after to fail. Because we’re working with Jupyter Notebooks, some SQL errors are not highlighted in red like Python
errors usually are. Check the output under your %%sql
cell to see if it ran smoothly. Any text below
Running query in 'duckdb:///data/{database_name}.db
is an error message that you can read to debug your code. Note that this error could be formatted as one long string without any indents or newlines, so it may be harder than typical Python
errors to parse.
Referenced column "{some_string}" not found
There are 2 potential causes for this error:
- The column you’re looking for does not exist in the table. Make sure that the column names are spelled and capatalized exactly as shown in the assignment. If you are trying to reference a column that is not in the current table, consider using
JOIN
to be able to access these columns. You can reference the SQL II course notes for a refresher on how to use theJOIN
clause. - You used double quotes
""
aroundsome_string
instead of single quotes''
. UnlikePython
, SQL differentiates double and single quotes: double quotes indicate something that exists within the database, like tables and column names; single quotes indicate strings.
Cannot compare values of type __ and type __
This error often occurs when a string-type column (ie. CHAR
or VARCHAR
) is compared with a numeric-type column (ie. INT
, BIGINT
, FLOAT
, REAL
). If you’re interested in learning more about SQL datatypes, check out this link!
To debug, start by checking the type of the column(s) you’re working with; you can either scroll up to the cell (or make a new cell) with the code:
%%sql
SELECT * FROM sqlite_master WHERE type='table';
which outputs the schema for each table. If we’re working with the IMDB database, we get the following output:
We can see that the table Title
contains the columns
tconst
of typeBIGINT
titleType
of typeVARCHAR
primaryTitle
of typeVARCHAR
- and so on…
It’s a good idea to double check the types of the columns involved by looking at the schema before performing numerical comparisons and when deciding whether to CAST
a particular column.
My query runs but I’m not passing the tests
For a lot of questions in the homework, we limit our focus to movie titles. Double check that you are filtering your results appropriately. More generally, if you find your queries aren’t erroring, but are failing the tests due to some logical error, a good starting point is always reading the question again and making a checklist of all the conditions you need to implement.
Comments in SQL
Unlike
Python
, which uses#...
or triple quotes"""..."""
to indicate a comment, SQL comments take the following forms:Because you are still coding within a Python notebook, the colors that Jupyter chooses for your SQL queries may not match up to what you expect; using
#...
will be green text like Python comments, but running the code will give you:Parser Error: syntax error at or near...
Using
--
or/* ... */
may not color-code the text correctly, but SQL will still read it as a comment!