Here is a collection of resources that will help you learn more about various concepts and skills covered in the class. Learning by reading is a key part of being a well rounded data scientist. We will not assign mandatory reading but instead encourage you to look at these and other materials. If you find something helpful, post it on EdStem, and consider contributing it to the course website.
- Optional Supplementary Textbook
- Exam Resources
- Course Website
- Local Setup
- Coding and Probability Resources
- Data Science Education
Optional Supplementary Textbook
Alongside each lecture are optional textbook readings to the Data 100 textbook, Principles and Techniques of Data Science. Textbook readings are purely supplementary, and may contain material that is not in scope (and may also not be comprehensive). The textbook is actively in development during Spring 2022! Some readings may become out-of-date or reordered as the semester progresses. If you see a reading on our schedule that no longer exists, don’t hesitate to send a pull request to our course GitHub (see below).
|Semester||Midterm (1)||Midterm 2||Final|
|Fall 2021||Exam (Solutions)|
|Summer 2021||Exam (Solutions) [Video]||Exam (Solutions)|
|Spring 2021||Exam (Solutions)||Exam (Solutions)|
|Fall 2020||Exam (Solutions)||Exam (Solutions)|
|Summer 2020||Exam (Solutions)||Exam (Solutions)||Exam (Solutions)|
|Spring 2020||Checkpoint (Solutions)||N/A|
|Fall 2019||Exam (Solutions)||Exam (Solutions)||Exam (Solutions)|
|Summer 2019||Exam (Solutions) [Video]||Exam (Solutions)|
|Spring 2019||Exam (Solutions) [Video]||Exam (Solutions) [Video]||Exam (Solutions)|
|Fall 2018||Exam (Solutions)||Exam (Solutions)|
|Spring 2018||Exam (Solutions)||Exam (Solutions) [Video]|
|Fall 2017||Exam (Solutions) [Video]||Exam (Solutions)|
|Spring 2017||Exam (Solutions)||Exam (Solutions)|
We will be posting all lecture materials on the course syllabus. In addition, they will also be listed in the following publicly visible Github Repo.
Click here to read our guide on how to set up our development environment locally (as an alternative to using DataHub). Please note that any autograder tests will only work on DataHub.
Coding and Probability Resources
- The Pandas Cookbook: This provides a nice overview of some of the basic Pandas functions. However, it is slightly out of date.
- Learn Pandas A set of lessons providing an overview of the Pandas library.
- Python for Data Science Another set of notebook demonstrating Pandas functionality.
- We’ve assembled some SQL Review Slides to help you brush up on SQL.
- We’ve also compiled a list of SQL practice problems, which can be found here, along with their solutions.
- This SQL Cheat Sheet is an awesome resource that was created by Luke Harrison, a former Data 100 student.
- We’ve compiled a few practice probability problems that we believe may help in understanding the ideas covered in the course. They can be found here, along with their solutions.
- We’d also like to point you to the textbook for Stat 88, an introductory probability course geared towards data science students at Berkeley.
- We’ve organized some regex problems to help you get extra practice on regex in a notebook format. They can be found here, along with their solutions.
Other Web References
As a data scientist you will often need to search for information on various libraries and tools. In this class we will be using several key python libraries. Here are their documentation pages:
The Bash Command Line:
- Python Tutorial: Teach yourself python. This is a pretty comprehensive tutorial.
- Python + Numpy Tutorial this tutorial provides a great overview of a lot of the functionality we will be using in DS100.
- Python 101: A notebook demonstrating a lot of python functionality with some (minimal explanation).
- Data Visualization:
- matplotlib.pyplot tutorial: This short tutorial provides an overview of the basic plotting utilities we will be using.
- Altair Documentation: Altair(Vega-Lite) is a new and powerful visualization library. We might not get to teach it this semester, but you should check it out if you are interested in pursuing visualization deeper. In particular, you should find the example gallery helpful.
- Prof. Jeff Heer’s Visualization Curriculum: This repository contains a series of Python-based Jupyter notebooks that teaches data visualization using Vega-Lite and Altair.
- If you are interested in learning more about data visualization, you can find more materials in:
Because data science is a relatively new and rapidly evolving discipline there is no single ideal textbook for this subject. Instead we plan to use reading from a collection of books all of which are free. However, we have listed a few optional books that will provide additional context for those who are interested.
Principles and Techniques of Data Science, the Data 100 textbook.
Introduction to Statistical Learning (Free online PDF) This book is a great reference for the machine learning and some of the statistics material in the class
Data Science from Scratch (Available as eBook for Berkeley students) This more applied book covers many of the topics in this class using Python but doesn’t go into sufficient depth for some of the more mathematical material.
Data Science Education
Interested in bringing the Data Science major or curriculum to your academic institution? Please fill out this form if you would like support from Berkeley in offering some variant of our Data Science courses at your institution (or just to let us know that you’re interested). Information about the courses appear at data8.org and ds100.org. Please note that this form is only for instructors. If you are only interested in learning Python or data science, please look at our Data 8 or Data 100 websites mentioned above.