Lecture 6 – Data Cleaning and EDA
Presented by Anthony D. Joseph
Content by Anthony D. Joseph, Joseph Gonzalez, Deborah Nolan, Joseph Hellerstein
A reminder – the right column of the table below contains Quick Checks. These are not required but suggested to help you check your understanding.
Video | Quick Check | |
---|---|---|
6.0 Introduction. |
||
6.1 Exploratory data analysis and its position in the data science lifecycle. The relationship between data cleaning and EDA. |
6.1 | |
6.2 Exploring various different data storage formats and their tradeoffs. |
6.2 | |
6.3 Primary keys and foreign keys. Eliminating redundancy in tables. |
6.3 | |
6.4 Defining and discussing the terms quantitative discrete, quantitative continuous, qualitative ordinal, qualitative nominal. |
6.4 | |
6.5 Discussing the granularity and scope of our data to ensure that it's appropriate for analysis. Discussing various methods of encoding time, and flaws to be aware of. |
6.5 | |
6.6 Ways in which our data can be incorrect or corrupt. Different methods for addressing missing values, and their tradeoffs. |
6.6 | |
6.7 Summarizing the process of EDA. |
6.7 | |
(Optional) 6.8 A demo of EDA on real data. |
N/A |