Lecture 7 – Data Cleaning and EDA
Presented by Anthony D. Joseph
Content by Anthony D. Joseph, Joseph Gonzalez, Deborah Nolan, Joseph Hellerstein
A reminder – the right column of the table below contains Quick Checks. These are not required but suggested to help you check your understanding.
Video | Quick Check | |
---|---|---|
7.0 Announcements. |
||
7.1 Exploratory data analysis and its position in the data science lifecycle. The relationship between data cleaning and EDA. |
7.1 | |
7.2 Exploring various different data storage formats and their tradeoffs. |
7.2 | |
7.3 Primary keys and foreign keys. Eliminating redundancy in tables. |
7.3 | |
7.4 Defining and discussing the terms quantitative discrete, quantitative continuous, qualitative ordinal, qualitative nominal. |
7.4 | |
7.5 Discussing the granularity and scope of our data to ensure that it's appropriate for analysis. Discussing various methods of encoding time, and flaws to be aware of. |
7.5 | |
7.6 Ways in which our data can be incorrect or corrupt. Different methods for addressing missing values, and their tradeoffs. |
7.6 | |
7.7 Summarizing the process of EDA. |
7.7 | |
(Optional) 7.8 A demo of EDA on real data. |
N/A |