Resources
Here is a collection of resources that will help you learn more about various concepts and skills covered in the class. Learning by reading is a key part of being a well rounded data scientist. We will not assign mandatory reading but instead encourage you to look at these and other materials. If you find something helpful, post it on EdStem, and consider contributing it to the course website.
Jump to:
- Supplementary Course Notes
- Optional Supplementary Textbook
- Exam Resources
- Course Website
- Coding and Mathematics Resources
- Books
- Wellness Resources
- Data Science Education
- Local Setup (Old)
Supplementary Course Notes
Alongside each lecture are supplementary Course Notes. These are in development for the Spring 2023 Edition of the UC Berkeley course Data 100: Principles and Techniques of Data Science.
Lecture notes will be updated on a weekly basis, prior to the lecture. If you spot any errors or would like to suggest any changes, please email us at data100.instructors@berkeley.edu.
Optional Supplementary Textbook
Alongside each lecture are optional textbook readings to the Data 100 textbook, Principles and Techniques of Data Science. Textbook readings are purely supplementary, and may contain material that is not in scope (and may also not be comprehensive). The textbook is actively in development during Spring 2023! Some readings may become out-of-date or reordered as the semester progresses. If you see a reading on our schedule that no longer exists, don’t hesitate to send a pull request to our course GitHub (see below).
Exam Resources
Semester | Midterm (1) | Midterm 2 | Final | Reference Sheet |
---|---|---|---|---|
Spring 2023 | Exam (Solutions) | Exam (Solutions) | Midterm, Final | |
Fall 2022 | Exam (Solutions) | Midterm | ||
Summer 2022 | Exam (Solutions) | Exam (Solutions) | Midterm, Final | |
Spring 2022 | Exam (Solutions) | Exam (Solutions) | Exam (Solutions) | Midterm 1, Midterm 2, Final |
Fall 2021 | Exam (Solutions) | |||
Summer 2021 | Exam (Solutions) [Video] | Exam (Solutions) | ||
Spring 2021 | Exam (Solutions) | Exam (Solutions) | ||
Fall 2020 | Exam (Solutions) | Exam (Solutions) | ||
Summer 2020 | Exam (Solutions) | Exam (Solutions) | Exam (Solutions) | |
Spring 2020 | Checkpoint (Solutions) | N/A | Checkpoint | |
Fall 2019 | Exam (Solutions) | Exam (Solutions) | Exam (Solutions) | Midterm 1 |
Summer 2019 | Exam (Solutions) [Video] | Exam (Solutions) | ||
Spring 2019 | Exam (Solutions) [Video] | Exam (Solutions) [Video] | Exam (Solutions) | Midterm 1 |
Fall 2018 | Exam (Solutions) | Exam (Solutions) | ||
Spring 2018 | Exam (Solutions) | Exam (Solutions) [Video] | ||
Fall 2017 | Exam (Solutions) [Video] | Exam (Solutions) | ||
Spring 2017 | Exam (Solutions) | Exam (Solutions) |
Course Website
We will be posting all lecture materials on the course syllabus. In addition, they will also be listed in the following publicly visible Github Repo.
You can send us changes to the course website by forking and sending a pull request to the course website github repository. You will then become part of the history of Data 100 at Berkeley.
Coding and Mathematics Resources
Pandas
- DS100 Textbook Pandas Reference Table
- Pandas API Reference
- The Pandas Cookbook: This provides a nice overview of some of the basic Pandas functions. However, it is slightly out of date.
- Learn Pandas A set of lessons providing an overview of the Pandas library.
- Python for Data Science Another set of notebook demonstrating Pandas functionality.
SQL
- We’ve assembled some SQL Review Slides to help you brush up on SQL.
- We’ve also compiled a list of SQL practice problems, which can be found here, along with their solutions.
- This SQL Cheat Sheet is an awesome resource that was created by Luke Harrison, a former Data 100 student.
Regex
- Regex101.com. Remember to select the Python flavour of Regex!
- DS100 Reference Sheet
- We’ve organized some regular expressions(regex) problems to help you get extra practice on regex in a notebook format. They can be found here, along with their solutions.
- The official Python3 regex guide is good: link
LaTeX
Other Web References
As a data scientist you will often need to search for information on various libraries and tools. In this class we will be using several key python libraries. Here are their documentation pages:
- Python:
- DS100 Textbook scikit-learn Reference Table
- Python Tutorial: Teach yourself python. This is a pretty comprehensive tutorial.
- Python + Numpy Tutorial this tutorial provides a great overview of a lot of the functionality we will be using in DS100.
- Python 101: A notebook demonstrating a lot of python functionality with some (minimal explanation).
- Data Visualization:
- DS100 Textbook Seaborn Reference Table and Matplotlib Reference Table
- matplotlib.pyplot tutorial: This short tutorial provides an overview of the basic plotting utilities we will be using.
- Pandas Tutor.
- Kernel Density Visualization.
- Altair Documentation: Altair(Vega-Lite) is a new and powerful visualization library. We might not get to teach it this semester, but you should check it out if you are interested in pursuing visualization deeper. In particular, you should find the example gallery helpful.
- Prof. Jeff Heer’s Visualization Curriculum: This repository contains a series of Python-based Jupyter notebooks that teaches data visualization using Vega-Lite and Altair.
- If you are interested in learning more about data visualization, you can find more materials in:
- Edward Tufte’s book sequences – a classic!
- Prof. Heer’s class.
Calculus and Linear Algebra
Note: None of these resources are meant to be a substitute for the appropriate requirement / co-requisite (Math 54, etc.). If you have no familiarity whatsoever with either of these topics, these may not be adequate and we strongly recommend spending time covering the prerequisite material yourself. We will assume that you have prior knowledge of these requirements and that these resources are simply to refresh your memory of concepts that you have previously learned. Please reach out to staff if you have any questions or concerns about this.
Calculus: In terms of calculus, you will need to know a few things, most of which are covered within the space of the first homework and lab. Specifically, you will need to know univariate calculus rules like: Taking derivatives of a univariate function (i.e. f(x), where x is the only variable); Derivative power rule; Knowing derivatives of mathematical functions like: sinx,cosx,logx,ex; Chain rule; Product rule (rarely); Derivatives of sums. We will expect some multivariate fluency like: Taking partial derivatives of a multivariate function (i.e. f(x,y,z), where x,y,z are all variables); Gradients (the concept).
-
Khan Academy: Derivatives, Definitions, and Basic Rules; Multivariable Derivatives
-
Math 53: Derivatives of Vector Functions
Linear Algebra:
Concepts roughly in order of importance: vectors, matrices; rank/nullity; inner products, orthogonality, norms; linear independence; orthonormal matrices; vector spaces; projections; invertibility.
- EE16A notes/assignments: Vector and Matrix Operations (Note 2A, Note 2B); Span, Linear Dependence/Independence (Note 3); Linear Transformations (Note 5); Matrix Inversion (Note 6); Vector Subspaces (Note 6); Inner Products (Note 21); Least Squares (Note 23);
- Math 54: Prof. Alex Paulin Video Lectures
- Data 100 textbook: Geometric Perspective of Linear Projection (Chapter 15); Vector Spaces (Appendix 2)
- 3blue1brown: Essence of Linear Algebra
- Khan Academy: Linear Algebra
- MIT OpenCourseware: Linear Algebra Video Lectures
Probability
- We’ve compiled a few practice probability problems that we believe may help in understanding the ideas covered in the course. They can be found here, along with their solutions.
- We’d also like to point you to the textbook for Data C88S, an introductory probability course geared towards data science students at Berkeley.
Books
Because data science is a relatively new and rapidly evolving discipline there is no single ideal textbook for this subject. Instead we plan to use reading from a collection of books all of which are free. However, we have listed a few optional books that will provide additional context for those who are interested.
-
Principles and Techniques of Data Science, the Data 100 textbook.
-
Introduction to Statistical Learning (Free online PDF) This book is a great reference for the machine learning and some of the statistics material in the class
-
Data Science from Scratch (Available as eBook for Berkeley students) This more applied book covers many of the topics in this class using Python but doesn’t go into sufficient depth for some of the more mathematical material.
-
Doing Data Science (Available as eBook for Berkeley students) This books provides a unique case-study view of data science but uses R and not Python.
-
Python for Data Analysis (Available as eBook for Berkeley students). This book provides a good reference for the Pandas library.
Wellness Resources
Your well-being matters, and we hope that Data 100 is never a barrier to taking care of your mental and physical health. Below are some campus resources that may be helpful.
COVID-19 Resources and Support
You can find UC Berkeley’ COVID-19 resources and support here.
For academic performance, support, and technology
The Center for Access to Engineering Excellence (Bechtel Engineering Center 227) is an inclusive center that offers study spaces, nutritious snacks, and tutoring in >50 courses for Berkeley engineers and other majors across campus. The Center also offers a wide range of professional development, leadership, and wellness programs, and loans iclickers, laptops, and professional attire for interviews.
As the primary academic support service for undergraduates at UC Berkeley, the Student Learning Center (510-642-7332) assists students in transitioning to Cal, navigating the academic terrain, creating networks of resources, and achieving academic, personal, and professional goals. Through various services including tutoring, study groups, workshops, and courses, SLC supports undergraduate students in Biological and Physical Sciences, Business Administration, Computer Science, Economics, Mathematics, Social Sciences, Statistics, Study Strategies, and Writing.
The Educational Opportunity Program (EOP, Cesar Chavez Student Center 119; 510-642-7224) at Cal has provided first generation and low income college students with the guidance and resources necessary to succeed at the best public university in the world. EOP’s individualized academic counseling, support services, and extensive campus referral network help students develop the unique gifts and talents they each bring to the university while empowering them to achieve.
Students can access device lending options through the Student Technology Equity Program (STEP) program.
For mental well-being
The staff of the UHS Counseling and Psychological Services (Tang Center, 2222 Bancroft Way; 510-642-9494; for after-hours support, please call the 24/7 line at 855-817-5667) provides confidential, brief counseling and crisis intervention to students with personal, academic and career stress. Services are provided by a multicultural group of professional counselors including psychologists, social workers, and advanced level trainees. All undergraduate and graduate students are eligible for CAPS services, regardless of insurance coverage.
To improve access for engineering students, a licensed psychologist from the Tang Center also holds walk-in appointments for confidential counseling in Bechtel Engineering Center 241 (check here for schedule).
For disability accommodations
The Disabled Students’ Program (DSP, 260 César Chávez Student Center #4250; 510-642-0518) serves students with disabilities of all kinds, including mobility impairments, blind or low vision, deaf or hard of hearing; chronic illnesses (chronic pain, repetitive strain injuries, brain injuries, AIDS/HIV, cancer, etc.) psychological disabilities (bipolar disorder, severe anxiety or depression, etc.), Attention Deficit Disorder/Attention Deficit Hyperactivity Disorder, and Learning Disabilities. Services are individually designed and based on the specific needs of each student as identified by DSP’s Specialists. The Program’s official website includes information on DSP staff, UCB’s disabilities policy, application procedures, campus access guides for most university buildings, and portals for students and faculty.
For solving a dispute
The Ombudsperson for Students (Sproul Hall 102; 510-642-5754) provides a confidential service for students involved in a University-related problem (academic or administrative), acting as a neutral complaint resolver and not as an advocate for any of the parties involved in a dispute. The Ombudsperson can provide information on policies and procedures affecting students, facilitate students’ contact with services able to assist in resolving the problem, and assist students in complaints concerning improper application of University policies or procedures. All matters referred to this office are held in strict confidence. The only exceptions, at the sole discretion of the Ombudsperson, are cases where there appears to be imminent threat of serious harm.
The Student Advocate’s Office (SAO) is an executive, non-partisan office of the ASUC. We offer free, confidential casework services and resources to any student(s) navigating issues with the University, including academic, conduct, financial aid, and grievance concerns. All support is centered around students and aims for an equity-based approach.
For recovery from sexual harassment or sexual assault
The Care Line (510-643-2005) is a 24/7, confidential, free, campus-based resource for urgent support around sexual assault, sexual harassment, interpersonal violence, stalking, and invasion of sexual privacy. The Care Line will connect you with a confidential advocate for trauma-informed crisis support including time-sensitive information, securing urgent safety resources, and accompaniment to medical care or reporting.
For social services
Social Services provides confidential services and counseling to help students with managing problems that can emerge from illness such as financial, academic, legal, family concerns, and more. They specialize in helping students with pregnancy resources and referrals; alcohol/drug problems related to one’s own or a family member’s use; sexual assault/rape; relationship or other violence; and support for health concerns-new diagnoses or ongoing conditions. Social Services staff will assess a student’s immediate needs, work with the student to develop a plan to meet those needs, and facilitate arrangements with academic departments and advocate for the student with other campus offices and community agencies, as well as coordinate services within UHS.
For finding community on campus
The mission of the Berkeley International Office (2299 Piedmont Avenue, 510-642-2818) is to provide support with all the essential resources needed to not only survive, but thrive here at UC Berkeley. Their mission is to support you and work together towards justice and belonging for all. They define Basic Needs as the essential resources that impact your health, belonging, persistence, and overall well being. It is an ecosystem that includes: nutritious food, stable housing, hygiene, transportation, healthcare, mental wellness, financial sustainability, sleep, and emergency dependent services. They refuse to accept hunger, homelessness, and all other basic needs injustices as part of our university.
The Gender Equity Resource Center, fondly referred to as GenEq, is a UC Berkeley campus community center committed to fostering an inclusive Cal experience for all. GenEq is the campus location where students, faculty, staff and Alumni connect for resources, services, education and leadership programs related to gender and sexuality. The programs and services of the Gender Equity Resource Center are focused into four key areas: women; lesbian, gay, bisexual, and transgender (LGBT); sexual and dating violence; and hate crimes and bias driven incidents. GenEq strives to provide a space for respectful dialogue about sexuality and gender; illuminate the interrelationship of sexism, homophobia and gender bias and violence; create a campus free of violence and hate; provide leadership opportunities; advocate on behalf of survivors of sexual, hate, dating and gender violence; foster a community of women and LGBT leaders; and be a portal to campus and community resources on LGBT, Women, and the many intersections of identity (e.g., race, class, ability, etc.).
The Undocumented Students Program (119 Cesar Chavez Center; 642-7224) practices a holistic, multicultural and solution-focused approach that delivers individualized service for each student. The academic counseling, legal support, financial aid resources and extensive campus referral network provided by USP helps students develop the unique gifts and talents they each bring to the university, while empowering a sense of belonging. The program’s mission is to support the advancement of undocumented students within higher education and promote pathways for engaged scholarship.
The Multicultural Education Program (MEP) is one of six initiatives funded by the Evelyn and Walter Haas, Jr. Fund to work towards institutional change and to create a positive campus climate for diversity. The MEP is a five-year initiative to establish a sustainable infrastructure for activities like educational consultation and diversity workshops for the campus that address both specific topics, and to cater to group needs across the campus.
For basic needs (food, shelter, etc.)
The Basic Needs Center (lower level of MLK Student Union, Suite 72) provides support with all the essential resources needed to not only survive, but thrive here at UC Berkeley. Their mission is to support you and work together towards justice and belonging for all. They define Basic Needs as the essential resources that impact your health, belonging, persistence, and overall well being. It is an ecosystem that includes: nutritious food, stable housing, hygiene, transportation, healthcare, mental wellness, financial sustainability, sleep, and emergency dependent services. They refuse to accept hunger, homelessness, and all other basic needs injustices as part of our university.
The UC Berkeley Food Pantry (#68 Martin Luther King Student Union) aims to reduce food insecurity among students and staff at UC Berkeley, especially the lack of nutritious food. Students and staff can visit the pantry as many times as they need and take as much as they need while being mindful that it is a shared resource. The pantry operates on a self-assessed need basis; there are no eligibility requirements. The pantry is not for students and staff who need supplemental snacking food, but rather, core food support.
Data Science Education
Interested in bringing the Data Science major or curriculum to your academic institution? Please fill out this form if you would like support from Berkeley in offering some variant of our Data Science courses at your institution (or just to let us know that you’re interested). Information about the courses appear at data8.org and ds100.org. Please note that this form is only for instructors. If you are only interested in learning Python or data science, please look at our Data 8 or Data 100 websites mentioned above.
Local Setup (Old)
NOTE: This section is out of date and no longer supported by the course staff.
Click here to read our guide on how to set up our development environment locally (as an alternative to using DataHub). Please note that any autograder tests will only work on DataHub.