Principles and Techniques of Data Science
UC Berkeley, Summer 2022
Anirudhan Badrinath
Jump to current week: here.
- Frequently Asked Questions: Before posting on the class Ed, please read the class FAQ page.
- The Syllabus contains a detailed explanation of how each course component will work this summer, given that the course is being taught entirely online.
- Textbook readings are optional and actively in development. See the Resources for more details.
- Note: The schedule of lectures and assignments is subject to change.
Schedule
Week 1
- Jun 21
Lab 1 Prerequisite Coding (due Jun 27)
Lab 2 Pandas (due Jun 27)
- Jun 22
Lecture 2 Pandas I
Textbook: Pandas Reference Table
Reference: Pandas API Documentation
- Jun 23
Lecture 3 Pandas II
Discussion 0 (Optional) Fundamentals
Discussion 1 Sampling and Probability, Pandas, code
Homework 1 Intro + Prerequisites (due Jun 27)
- Jun 23
Exam Prep 1 Sampling and Probability, Pandas
Week 2
- Jun 27
Lecture 4 Data Cleaning, EDA
Weekly Check 2 Weekly Check 2
Lab 3 Data Cleaning and EDA (due Jul 2)
Lab 4 Transformations and KDE (due Jul 2)
Homework 2 Food Safety (due Jun 30)
- Jun 28
Lecture 5 Regex
Discussion 2 Pandas, Data Cleaning
- Jun 29
Lecture 6 Visualization I
Textbook: Seaborn Reference Table
Textbook: Matplotlib Reference Table
- Jun 30
Lecture 7 Visualization II
Discussion 3 Regex, Visualization
Homework 3 Tweets (due Jul 5)
- Jul 1
Exam Prep 2 Pandas, Visualization, Regex
Catch-up section 1
Week 3
- Jul 4
Independence Day
Weekly Check 3 Weekly Check 3
Lab 5 Modeling, Loss Functions, and Summary Statistics (due Jul 9)
Lab 6 Linear Regression (due Jul 9)
Homework 4 Bike Sharing (due Jul 7)
- Jul 5
Discussion 4 Modeling and Visualization
- Jul 6
- Jul 7
Lecture 10 Ordinary Least Squares (Multiple Linear Regression)
Discussion 5 Linear Model and Loss Function
Homework 5 Regression (On paper) (due Jul 11)
- Jul 8
Exam Prep 3 SLR & OLR
Catch-up section 2
Week 4
- Jul 11
Lecture 11 Gradient Descent, sklearn
Weekly Check 4 Weekly Check 4
Lab 7 Gradient Descent and Feature Engineering (due Jul 16)
Lab 8 Model Selection, Regularization, and Cross-Validation (due Jul 16)
Proj 1A Housing I (due Jul 14)
- Jul 12
Lecture 12 Gradient Descent, Feature Engineering
Discussion 6 Geometry of Least Squares, Gradient Descent, HCE
- Jul 13
Lecture 13 Cross-Validation and Regularization
- Jul 14
Discussion 7 HCE, OHE, Ridge and Lasso Linear Regression
Proj 1B Housing II (due Jul 25)
- Jul 15
Midterm Review Midterm Review
Exam Prep 4 Gradient Descent, Weighted Least Square
Solution, no recording
Catch-up section 3 Cancelled
Week 5
- Jul 18
Midterm Midterm Exam
Weekly Check 5 Weekly Check 5
Lab 9 Probability and Modeling (due Jul 23)
- Jul 19
Break (No Lecture)
- Jul 20
Lecture 15 Probability I: Random Variables
- Jul 21
Discussion 8 Probability and Bias-Variance Trade-off
- Jul 22
Exam Prep 5 CV, Probability, BVT
Solution, no recording
Catch-up section 4 TBD
Recording
Week 6
- Jul 25
Lecture 17 SQL I
Weekly Check 6 Weekly Check 6
Lab 10 SQL (due Jul 30)
Lab 11 PCA (due Jul 30)
Homework 6 Probability and Estimators coding, written pdf, written latex (due Jul 28)
- Jul 26
Lecture 18 SQL II and PCA I
Discussion 9 BVT & SQL
- Jul 27
Lecture 19 PCA II
- Jul 28
Break (No Lecture)
Discussion 10 SQL & PCA
Homework 7 SQL and PCA (due Aug 1)
- Jul 29
Exam Prep 6 SQL & PCA
Catch-up section 5
Week 7
- Aug 1
Lecture 20 Classification and Logistic Regression I
Weekly Check 7 Weekly Check 7
Lab 12 Logistic Regression (due Aug 6)
Lab 13 Decision Trees & Random Forests (due Aug 6)
Proj 2A Spam I (due Aug 4)
- Aug 2
Lecture 21 Logistic Regression II
Discussion 11 Logistic Regression
- Aug 3
Lecture 22 Decision Trees
- Aug 4
Lecture 23 Clustering
Discussion 12 Decision Trees, Clustering
Proj 2B Spam II (due Aug 8)
- Aug 5
Exam Prep 7 Classifier and Clustering
Catch-up section 6
Week 8
- Aug 8
Weekly Check 8 Weekly Check 8
- Aug 9
- Aug 10
- Aug 11
Optional Lecture Neural Networks
Review
- Aug 12
Final Final Exam