Principles and Techniques of Data Science
UC Berkeley
Offerings
 Summer 2024
 Spring 2024
 Fall 2023
 Summer 2023
 Spring 2023
 Fall 2022
 Summer 2022
 Spring 2022
 Fall 2021
 Summer 2021
 Spring 2021
 Fall 2020
 Summer 2020
 Spring 2020
 Fall 2019
 Summer 2019
 Spring 2019
 Fall 2018
 Spring 2018
 Fall 2017
 Spring 2017
Overview
Combining data, computation, and inferential thinking, data science is redefining how people and organizations solve challenging problems and understand their world. This intermediatelevel class bridges between Data 8 and upperdivision computer science and statistics courses as well as methods courses in other fields. In this class, we explore key areas of data science including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decisionmaking. Through a strong emphasis on datacentric computing, quantitative critical thinking, and exploratory data analysis this class covers key principles and techniques of data science. These include languages for transforming, querying, and analyzing data; algorithms for machine learning methods including regression, classification, and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.
Goals

Prepare students for advanced Berkeley courses in datamanagement (CS 186), machine learning (CS 189), and statistics (Stat 154), by providing the necessary foundation and context

Enable students to start careers as data scientists by providing experience working with realworld data, tools, and techniques

Empower students to apply computational and inferential thinking to tackle realworld problems
Prerequisites
While we are working to make this class widely accessible we currently require the following (or equivalent) prerequisites:

Foundations of Data Science: Data 8 covers much of the material in Data 100 but at an introductory level. Data8 provides basic exposure to Python programming and working with tabular data as well as visualization, statistics, and machine learning.

Computing: The Structure and Interpretation of Computer Programs CS 61A or Computational Structures in Data Science CS 88. These courses provide additional background in Python programming (e.g., for loops, lambdas, debugging, and complexity) that will enable Data 100 to focus more on the concepts in Data Science and less on the details of programming in python.

Math: Linear Algebra (Math 54, EE 16A, or Stat 89A): We will need some basic concepts like linear operators and derivatives to enable statistical inference and derive new prediction algorithms. This may be satisfied concurrently to Data 100.