Principles and Techniques of Data Science
Combining data, computation, and inferential thinking, data science is redefining how people and organizations solve challenging problems and understand their world. This intermediate level class bridges between Data8 and upper division computer science and statistics courses as well as methods courses in other fields. In this class, we explore key areas of data science including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making. Through a strong emphasizes on data centric computing, quantitative critical thinking, and exploratory data analysis this class covers key principles and techniques of data science. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.
This class is listed as STAT C100 and as COMPSCI C100.
Important Information:
- When: Lectures Tuesdays and Thursdays from 11:00AM to 12:30PM
- Where: 150 Wheeler
- What: See the lecture schedule
- News: We will post updates about the class on Piazza
If you have issues with enrollment contact: Cindy Conners
Office Hours, Section, and Lab Schedule
For official holidays see the academic calendar.
Goals
-
Prepare students for advanced Berkeley courses in data-management, machine learning, and statistics, by providing the necessary foundation and context
-
Enable students to start careers as data scientists by providing experience working with real-world data, tools, and techniques
-
Empower students to apply computational and inferential thinking to address real-world problems
Prerequisites
While we are working to make this class widely accessible we currently require the following (or equivalent) prerequisites :
-
Foundations of Data Science: Data8 covers much of the material in DS100 but at an introductory level. Data8 provides basic exposure to python programming and working with tabular data as well as visualization, statistics, and machine learning.
-
Computing: The Structure and Interpretation of Computer Programs CS61A or Computational Structures in Data Science CS88. These courses provide additional background in python programming (e.g., for loops, lambdas, debugging, and complexity) that will enable DS100 to focus more on the concepts in Data Science and less on the details of programming in python.
-
Math: Linear Algebra (Math 54, EE 16a, or Stat89a): We will need some basic concepts like linear operators, eigenvectors, derivatives, and integrals to enable statistical inference and derive new prediction algorithms. This may be satisfied concurrently to DS100.
Instructors
jegonzal@cs.berkeley.edu
Fernando.Perez@berkeley.edu
Teaching Assistants
bjiang@berkeley.edu
jake_soloff@berkeley.edu
sona.jeswani@berkeley.edu
nhiquach@berkeley.edu
do@berkeley.edu
edward.fang@berkeley.edu
manana.hakobyan@berkeley.edu
calebs11@berkeley.edu
jsylo@berkeley.edu
louis.remus@berkeley.edu
xmo@berkeley.edu
justinkangg@berkeley.edu
amandhar@berkeley.edu
kgoot@berkeley.edu
aakash.bhalothia@berkeley.edu
weiwzhang@berkeley.edu
Undergraduate Research Opportunities
Berkeley is an amazing place to learn about and participate in research. We strongly encourage students to look for research opportunities as well as opportunities to get involved in building the tools used by data scientists around the world. The following is a list of research and development opportunities:
- Professor Gonzalez is often looking for undergraduates to get involved in his many projects. If you are interested in getting involved stop by office hours and complete this google form.
- The Berkeley Institute for Data science has many opportunities for research. Attend these events and learn more about what people are doing and ask them how you can help.
- Take a look at some of the issues on big open source projects and consider getting involved in addressing them: