# Syllabus

Jump to:

## About Data 100

Combining data, computation, and inferential thinking, data science is redefining how people and organizations solve challenging problems and understand their world. This intermediate level class bridges between Data8 and upper division computer science and statistics courses as well as methods courses in other fields. In this class, we explore key areas of data science including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making. Through a strong emphasis on data centric computing, quantitative critical thinking, and exploratory data analysis, this class covers key principles and techniques of data science. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.

### Goals

- Prepare students for advanced Berkeley courses in data-management, machine learning, and statistics, by providing the necessary foundation and context
- Enable students to start careers as data scientists by providing experience working with real-world data, tools, and techniques
- Empower students to apply computational and inferential thinking to address real-world problems

### Prerequisites

While we are working to make this class widely accessible, we currently require the following (or equivalent) prerequisites. **We are not enforcing prerequisites during enrollment. However, all of the prerequisties will be used starting very early on in the class. It is your responsibility to know the material in the prerequisites.**:

**Foundations of Data Science**: Data8 covers much of the material in Data 100 but at an introductory level. Data8 provides basic exposure to python programming and working with tabular data as well as visualization, statistics, and machine learning.**Computing**:*The Structure and Interpretation of Computer Programs*(CS 61A) or*Computational Structures in Data Science*(CS 88). These courses provide additional background in python programming (e.g., for loops, lambdas, debugging, and complexity) that will enable Data 100 to focus more on the concepts in Data Science and less on the details of programming in python.**Math**:*Linear Algebra*(Math 54, EE 16a, or Stat89a): We will need some basic concepts like linear operators, eigenvectors, derivatives, and integrals to enable statistical inference and derive new prediction algorithms. This may be satisfied concurrently to Data 100.

## Online Format

This summer, Data 100 will be run entirely online. This section details exactly how each component of the course will operate.

- To see when any live events are scheduled, check the Calendar.
- To see when lectures, discussions, and assignments are released (and due), check the Home Page.

### Lecture

- There are 4 lectures per week over the summer.
**Lectures will be entirely pre-recorded**, in a format that is optimized for online learning (short 5-10 minute videos with conceptual problems in between). Lecture videos will be released on the mornings of Monday, Tuesday, Wednesday, and Thursday (or the night before).- Some of these will be from previous semesters, and some will be recorded this summer by the instructors.
- Lecture videos will be posted on YouTube. Each “lecture” will be an html page linked on the course website, containing videos and links to slides and code.
- There are “Quick Check” conceptual questions in between each lecture video, linked on the lecture webpage. These are meant for you to check your understanding of the concepts that were just introduced. These are not graded.
- Each lecture will also have a Piazza thread for students to ask questions.

- In order to facilitate some interaction, instructors will be holding
**one live “lecture recap” per week**. It will be an hour long, hosted via Zoom on Fridays from 12-1PM (and will be recorded). Most notably,**we will not introduce new concepts in these recap sessions**; instead, they’ll consist of:- Answering students’ questions.
- Going over challenging problems (past exam problems, or more generally forcing students to push outside of what they’ve already learned).
- Other high-level overviews as deemed necessary.

- The instructors will also be hosting several conceptual office hours per week. See the Calendar for more details.
- Some special lectures will be live (such as the first lecture, last lecture, and any guest lectures). Specifics will be on Piazza.

Note: Alongside each lecture are textbook readings. Textbook readings are purely supplementary, and may contain material that is not in scope (and may also not be comprehensive).

### Homeworks and Projects

Homeworks are half-week long assignments that are designed to help students develop an in-depth understanding of both the theoretical and practical aspects of ideas presented in lecture. Projects are week-long assignments that integrate these ideas with real-world datasets.

- Each week, there are two homework assignments due.
- One homework will come out Sunday morning and will be due Wednesday night.
- The other will come out Thursday morning and will be due Sunday night.

- In some weeks, there will be a single long project instead of two homeworks.
- In total, there will be 8 homeworks and 2 projects.
- During midterm weeks, students will have a week to work on a homework (that they’d otherwise have half a week to work on).

- Two homeworks will be on-paper written assignments; the rest will be Jupyter notebooks.
- The primary form of support students will have for homeworks and projects are the
**office hours**we’ll host, and**Piazza**.

### Labs

Labs are shorter programming assignments designed to give students familiarity with new ideas.

- Each week, there are two lab assignments due.
- One lab will come out Monday morning and will be due Tuesday night.
- The other will come out Wednesday morning and will be due Thursday night.

- The primary form of support students will have for labs are the
**office hours**we’ll host, and**Piazza**. - We are also experimenting with a
**live lab section**, in which GSIs walk through the lab assignment via Zoom. The current plan is to host one of these for each lab assignment (one on Tuesday, one on Thursday); see the Calendar for when these are scheduled.

### Discussions

Discussion sections are meant to allow students a chance to discuss conceptual ideas and solve problems with other students, with the help of a GSI (this becomes slightly harder given the fact that this course is being offered completely remotely). Each discussion consists of a worksheet.

- Each week, there are two discussion worksheets.
- The first discussion will come out on Monday morning.
- The other will come out on Wednesday morning.

- There are two “pathways” we envision students taking when it comes to consuming discussion content.
- Watching a pre-recorded discussion video, and coming to a discussion recap section.
- Each discussion worksheet will be accompanied with a GSI-created video walkthrough, released at the same time. Students should watch this video soon after it is released.
- Then, students should come to the “discussion recap”, held on Monday afternoon for the first discussion and Wednesday afternoon for the second discussion (one each). The goal of these is to finish answering questions in the videos, and to serve as conceptual office hours for those concepts.

- Coming to a live Zoom discussion section.
- There will be three live Zoom discussions on Monday (morning, afternoon, night) and three on Wednesday, for the first and second discussions of the week, respectively. GSIs will host these in pairs and take turns rotating through.

- Watching a pre-recorded discussion video, and coming to a discussion recap section.
- Each week, we will survey students on which of the two pathways they utilized and think is helpful, and will scale our resources accordingly.

### Office Hours

- We plan on hosting roughly 10 hours of office hours each weekday. These hours are listed on the Calendar.
- OH will serve as a one-stop shop for students to get help with assignments.
- Notably, they also serve as a replacement for traditional lab sessions.

- Office Hours can be accessed via oh.ds100.org, where students add themselves to the “queue” and specify the assignment they need help on. Once it’s their turn, they will be provided with a Zoom link to join, in order to get help from staff.
- The instructors will also be hosting conceptual office hours. These will be reflected on the Calendar.

### Exams

There will be two midterm exams, and a final exam. Details about dates can be found in the policies section below.

## Policies

Scores in the course are assigned according to the following weights:

Category | Weight | Details |
---|---|---|

Homeworks | 28% | 4% each (8, with 1 drop) |

Labs | 10% | 1% each (14, with 4 drops) |

Surveys | 2% | 0.25% each (8, with 0 drops) |

Projects | 12% | 6% each (2, with 0 drops) |

Midterm 1 | 12% | |

Midterm 2 | 12% | |

Final | 24% | two days, 12% each day |

#### Assignments

**Homeworks:**Homeworks are usually assigned twice every week (see Projects below). They must be completed individually and will mix programming and short-answer questions. Homeworks have both visible and hidden autograder tests. The visible tests are mainly sanity checks, e.g. a probability is <= 1, and are visible to students while they do the assignment. The hidden tests generally check for correctness, and are invisible to students while they are doing the assignment. Your lowest homework score will be dropped.**Labs:**Labs are assignments that complement the homeworks. There will be two lab assignments every week. All lab autograder tests are visible. Your four lowest lab scores will be dropped.**Surveys:**Weekly check-ins to gauge and receive student feedback, via Google Forms.**Projects:**Projects are week-long assignments that synthesize multiple topics.

#### Exams

**Midterms:**There will be two midterms.- Midterm 1:
**Thursday, July 9th, 7-8:30PM PDT**. - Midterm 2:
**Monday, July 27th, 7-8:30PM PDT**. - Alternate exams will only be given to students with a documented conflict, or to those who are in timezones where 7-8:30PM PDT is extremely inconvenient.

- Midterm 1:
**Final:**We are currently planning on having a two-day final, held on**Wednesday, August 12th**and**Thursday, August 13th**, from**7-8:30PM PDT**on both days.- Each day will consist of a separate 1.5 hour exam.
- The final will not be proctored at all (previously, we intended on using Zoom to proctor).

### Late Policy

All assignments are due at 11:59 pm on the due date specified on the syllabus. Extensions are only provided to students with DSP accommodations, or in the case of exceptional circumstances.

**Homeworks and labs will not be accepted late.**- Gradescope may allow you to make late submissions, but you will later be given a 0.

- Projects are marked down by 10% per day,
**up to two days**. After two days, project submissions will not be accepted.- Submission times are rounded up to the next day. That is, 2 minutes late = 1 day late.

### Collaboration Policy and Academic Dishonesty

#### Assignments

Data science is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually in your own words. If you do discuss the assignments with others please include their names at the top of your notebook. Keep in mind that content from assignments will likely be covered on both the midterm and final.

If we suspect that you have submitted plagiarized work, we will call you in for a meeting. If we then determine that plagiarism has occurred, we reserve the right to give you a negative full score (-100%) or lower on the assignments in question, along with reporting your offense to the Center of Student Conduct.

Rather than copying someone else’s work, ask for help. You are not alone in this course! The entire staff is here to help you succeed. If you invest the time to learn the material and complete the assignments, you won’t need to copy any answers. (taken from 61A)

#### Exams

Cheating on exams is a serious offense. We have methods of detecting cheating on exams – so don’t do it! Students caught cheating on any exam will fail this course. We will be following the EECS departmental policy on Academic Honesty, so be sure you are familiar with it.

### We want you to succeed!

If you are feeling overwhelmed, visit our office hours and talk with us. We know college can be stressful – and especially so during the COVID-19 pandemic – and we want to help you succeed.