We will be posting all lecture materials on the course syllabus. In addition, they will also be listed in the following publicly visible Google drive folder.
Here is a collection of resources that will help you learn more about various concepts and skills covered in the class. Learning by reading is a key part of being a well rounded data scientist. We will not assign mandatory reading but instead encourage you to look at these and other materials. If you find something helpful, post it on Piazza, and consider contributing it to the course website.
As a data scientist you will often need to search for information on various libraries and tools. In this class we will be using several key python libraries. Here are their documentation pages:
- The Bash Command Line:
- Python Tutorial: Teach yourself python. This is a pretty comprehensive tutorial.
- Python + Numpy Tutorial this tutorial provides a great overview of a lot of the functionality we will be using in DS100.
- Python 101: A notebook demonstrating a lot of python functionality with some (minimal explanation).
- Getting Started with Git: A tutorial on version control and Git.
- Git Reference: A condense version of git instructions.
- Understanding the Git Flow: This will give you a better idea of how Git projects work.
- Learning about Branches: This is a perhaps overly interactive tutorial that some people might find helpful.
- Explaining Git with D3
Because data science is a relatively new and rapidly evolving discipline there is no single ideal textbook for the course. Instead we plan to use reading from a collection of books all of which are free. However, we have listed a few optional books that will provide additional context for those who are interested.
Introduction to Statistical Learning (Free online PDF) This book is a great reference for the machine learning and some of the statistics material in the class
Data Science from Scratch (Available as eBook for Berkeley students) This more applied book covers many of the topics in this class using Python but doesn’t go into sufficient depth for some of the more mathematical material.
Relevant Classes At Berkeley:
- Stat89a: Linear Algebra for Data Science. An introduction to linear algebra for data science. The course will cover introductory topics in linear algebra, starting with the basics; discrete probability and how probability can be used to understand high-dimensional vector spaces; matrices and graphs as popular mathematical structures with which to model data (e.g., as models for term-document corpora, high-dimensional regression problems, ranking/classification of web data, adjacency properties of social network data, etc.); and geometric approaches to eigendecompositions, least-squares, principal components analysis, etc.