Lecture 7 – Data Cleaning and EDA
by Joseph Gonzalez (Spring 2020)
Make sure to complete the Quick Check questions in between each video. These are ungraded, but it’s in your best interest to do them.
Exploratory data analysis and its position in the data science lifecycle. The relationship between data cleaning and EDA.
Exploring various different data storage formats and their tradeoffs.
Primary keys and foreign keys. Eliminating redundancy in tables.
Defining and discussing the terms quantitative discrete, quantitative continuous, qualitative ordinal, qualitative nominal.
Discussing the granularity and scope of our data to ensure that it's appropriate for analysis. Discussing various methods of encoding time, and flaws to be aware of.
Ways in which our data can be incorrect or corrupt. Different methods for addressing missing values, and their tradeoffs.
Summarizing the process of EDA, and a demo of EDA on real data.