Data 100: Principles and Techniques of Data Science

UC Berkeley, Spring 2024

Ed Datahub Gradescope Lectures Playlist Extenuating Circumstances Office Hours Queue

Joseph E. Gonzalez

Joseph E. Gonzalez

He/Him/His

jegonzal@cs.berkeley.edu

Office Hours: Tuesdays from 3:00 to 4:30 in Soda 773 (Starting Jan 23rd)

Narges Norouzi

Narges Norouzi

She/Her/Hers

norouzi@berkeley.edu

Office Hours: Thursdays from 1:00 to 2:30PM in Warren Hall (Room 101-BC)

Welcome to Week 6 of Data 100!

Lectures will be webcast at: https://berkeley.zoom.us/j/91646148607.

Schedule

Week 1

Jan 16
Lecture 1 Introduction
Note 1
Lecture Participation 1 Lecture Participation 1
Jan 18
Lecture 2 Pandas I
Note 2
Lecture Participation 2 Lecture Participation 2
Jan 19
Lab 1 Prerequisite Coding (due Jan 23)
Homework 1A Plotting and Permutation Test (due Jan 25)
Homework 1B Prerequisite Math (due Jan 25)

Week 2

Jan 23
Lecture 3 Pandas II
Note 3
Lecture Participation 3 Lecture Participation 3
Discussion 1 Prerequisites
Solution, Video
Jan 25
Lecture 4 Pandas III
Note 4
Lecture Participation 4 Lecture Participation 4
Jan 26
Lab 2A Pandas (due Jan 30)
Homework 2A Food Safety (due Feb 1)

Week 3

Jan 30
Lecture 5 Data Wrangling and EDA (Part 1)
Note 5
Lecture Participation 5 Lecture Participation 5
Discussion 2 Pandas I, Worksheet Notebook, Groupwork Notebook
Worksheet Solution, Video Groupwork Solution, Video
Feb 1
Lecture 6 Data Wrangling and EDA (Part 2) & Text Wrangling and Regex
Note 6
Lecture Participation 6 Lecture Participation 6
Exam Prep 1 Pandas
Solution, Video
Feb 2
Lab 2B Data Cleaning and EDA (due Feb 6)
Homework 2B Food Safety II (due Feb 8)

Week 4

Feb 6
Lecture 7 Visualization I
Note 7
Lecture Participation 7 Lecture Participation 7
Discussion 3 Pandas II, EDA, Regex
Solution, Video
Feb 8
Lecture 8 Visualization II
Note 8
Lecture Participation 8 Lecture Participation 8
Exam Prep 2 Pandas II, RegEx
Solution, Video
Feb 9
Lab 3 Regex, EDA (due Feb 13)
Homework 3 Text Analysis of Bloomberg Articles (due Feb 15)

Week 5

Feb 13
Lecture 9 Sampling
Note 9
Lecture Participation 9 Lecture Participation 9
Discussion 4 Visualization and Transformation, Worksheet Notebook

Worksheet Solution, Notebook Solution, Video

Feb 15
Lecture 10 Modeling, SLR
Note 10
Lecture Participation 10 Lecture Participation 10
Exam Prep 3 Visualization
Solution, Video
Feb 16
Lab 4 Transformations (due Feb 20)
Homework 4 Bike Sharing (due Feb 22)

Week 6

Feb 20
Lecture 11 Constant model, Loss, and Transformations
Note 11
Lecture Participation 11 Lecture Participation 11
Discussion 5 Probability, Sampling, and Simple Linear Regression
Solution
Feb 22
Lecture 12 OLS (Multiple Regression)
Note 12
Lecture Participation 12 Lecture Participation 12
Exam Prep 4 Sampling
Feb 23
Lab 5 Modeling, Summary Statistics, and Loss Functions (due Feb 27)
Homework 5 Modeling and Regression (due Feb 29)

Week 7

Feb 27
Lecture 13 Gradient descent / sklearn

Discussion 6 Models

Feb 29
Lecture 14 Feature Engineering
Mar 1
Lab 6 OLS (due Mar 5)

Week 8

Mar 5
Lecture 15 Case Study (HCE): CCAO
Discussion 7 OLS, Gradient Descent
Mar 7
Lecture 16 No Lecture

Midterm Exam Midterm (7-9 PM PST)

Mar 8
Lab 7 Gradient descent and Sklearn (due Mar 12)
Project A1 Housing I (due Mar 14)

Week 9

Mar 12
Lecture 17 Cross-Validation and Regularization

Discussion 8 Feature Engineering, Housing

Mar 14
Lecture 18 Random Variables
Mar 15
Lab 8 Model Selection (due Mar 19)
Project A2 Housing II (due Mar 21)

Week 10

Mar 19
Lecture 19 Estimators, Bias, and Variance

Discussion 9 Cross-Validation and Regularization

Mar 21
Lecture 20 Causal Inference and Confounding
Mar 22
Lab 9 Probability (due Apr 2)
Homework 6 Probability (due Apr 4)

Week 11

Mar 25
Spring Break
Mar 26
Spring Break
Mar 27
Spring Break
Mar 28
Spring Break
Mar 29
Spring Break

Week 12

Apr 2
Lecture 21 SQL I

Discussion 10 RVs, Bias, and Variance

Apr 4
Lecture 22 SQL II and Cloud Data
Apr 5
Lab 10 SQL (due Apr 9)
Homework 7 SQL (due Apr 11)

Week 13

Apr 9
Lecture 23 Logistic Regression I

Discussion 11 SQL

Apr 11
Lecture 24 Logistic Regression II
Apr 12
Lab 11 Logistic Regression (due Apr 16)
Project B1 Spam & Ham I (due Apr 18)

Week 14

Apr 16
Lecture 25 PCA I

Discussion 12 Logistic Regression

Apr 18
Lecture 26 PCA II
Apr 19
Lab 12 PCA (due Apr 23)
Project B2 Spam & Ham II (due Apr 25)

Week 15

Apr 23
Lecture 27 KMeans Clustering

Discussion 13 PCA

Apr 25
Lecture 28 Guest + closing
Apr 26
Lab 13 Clustering (due Apr 30)

Week 16

Apr 29
RRR
Apr 30
RRR
May 1
RRR
May 2
RRR
May 3
RRR

Week 17

May 9
Final Exam Final (8-11 AM PST)