About Data 100
Combining data, computation, and inferential thinking, data science is redefining how people and organizations solve challenging problems and understand their world. This intermediate level class bridges between Data 8 and upper division computer science and statistics courses as well as methods courses in other fields. In this class, we explore key areas of data science including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making. Through a strong emphasis on data centric computing, quantitative critical thinking, and exploratory data analysis, this class covers key principles and techniques of data science. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.
- Prepare students for advanced Berkeley courses in data-management, machine learning, and statistics, by providing the necessary foundation and context
- Enable students to start careers as data scientists by providing experience working with real-world data, tools, and techniques
- Empower students to apply computational and inferential thinking to address real-world problems
While we are working to make this class widely accessible, we currently require the following (or equivalent) prerequisites. We are not enforcing prerequisites during enrollment. However, all of the prerequisties will be used starting very early on in the class. It is your responsibility to know the material in the prerequisites.
- Foundations of Data Science: Data8 covers much of the material in Data 100 but at an introductory level. Data8 provides basic exposure to python programming and working with tabular data as well as visualization, statistics, and machine learning.
- Computing: The Structure and Interpretation of Computer Programs (CS 61A) or Computational Structures in Data Science (CS 88). These courses provide additional background in python programming (e.g., for loops, lambdas, debugging, and complexity) that will enable Data 100 to focus more on the concepts in Data Science and less on the details of programming in python.
- Math: Linear Algebra (Math 54, EE 16a, or Stat89a): We will need some basic concepts like linear operators, eigenvectors, derivatives, and integrals to enable statistical inference and derive new prediction algorithms. This may be satisfied concurrently to Data 100.
This spring, Data 100 will be run entirely online. This section details exactly how each component of the course will operate. But here’s a nice high-level “typical week in the course”:
|Office Hours||Office Hours||Office Hours||Office Hours||Office Hours|
|Lecture released||Lecture released|
|Mini-Discussion Section||Mini-Discussion Section|
|Discussion Section||Discussion Section|
|Homework due||Homework released|
|Lab due||Lab released|
Note that these deadlines are subject to change.
- To see when any live events are scheduled, check the Calendar.
- To see when lectures, discussions, and assignments are released (and due), check the Home Page.
This course has one required synchronous component: participating in your discussion section on Monday or Tuesday. The goal of these sessions is to work through problems, hone your skills, and flesh out your understanding as part of a team. The format of the discussion will alternate each week between a meeting of your full discussion section and a shorter meeting of your small group called a mini-discussion.
- Full Discussion Section (50 min): working through new challenging problems with your group and sharing with the larger section.
- Mini Discussion Section (25 min): taking turns discussing selected problems from the previous homework with just your small group.
The problems that you solve and present as part of discussion are important in understanding this material, so they are graded. Be sure to attend discussion.
- There are 2 lectures per week.
- Lectures will be entirely pre-recorded, in a format that is optimized for online learning (short 5-10 minute videos with conceptual problems in between). Lecture videos will be released on the mornings of Tuesday and Thursday.
- Some of these will be from previous semesters, and some will be recorded this spring by the instructors.
- Lecture videos will be posted on YouTube. Each “lecture” will be an html page linked on the course website, containing videos and links to slides and code.
- Each lecture will also have a Piazza thread for students to ask questions.
Note: Alongside each lecture are textbook readings. Textbook readings are purely supplementary, and may contain material that is not in scope (and may also not be comprehensive).
Homeworks are week-long assignments that are designed to help students develop an in-depth understanding of both the theoretical and practical aspects of ideas presented in lecture.
- In a typical week, homework is released on Friday and is due the following Thursday at 11:59PM.
- One or two homeworks will be on-paper written assignments; the rest will be Jupyter notebooks.
- Homeworks have both visible and hidden autograder tests. The visible tests are mainly sanity checks, e.g. a probability is <= 1, and are visible to students while they do the assignment. The hidden tests generally check for correctness, and are invisible to students while they are doing the assignment.
- The primary form of support students will have for homeworks and projects are the office hours we’ll host, and Piazza.
- Homeworks must be completed individually.
Labs are shorter programming assignments designed to give students familiarity with new ideas.
- In a typical week, lab is released on Friday and is due the following Thursday.
- All lab autograder tests are visible.
- In previous semesters, we held live lab sections; this semester, we will not be holding live sections for labs. Instead, the labs have been condensed and simplified, and will include a video walk-through to assist students in completing the assignment.
- Students can get help with labs at office hours and on Piazza.
- We plan on hosting roughly 10 hours of office hours each weekday. These hours are listed on the Calendar.
- OH will serve as a one-stop shop for students to get help with assignments.
- Office Hours can be accessed via oh.ds100.org, where students add themselves to the “queue” and specify the assignment they need help on. Once it’s their turn, they will be provided with a Zoom link to join, in order to get help from staff.
- The instructors will also be hosting conceptual office hours. These will be reflected on the Calendar.
There will be one midterm exam, on March 9th (7:00-9:00 PM PST), and a final exam on May 12th (11:30-2:30 PM PDT).
Alternate exam times will be offered for the midterm and final, and the form to request the alternate will be posted on Piazza during the second week of classes. The alternate midterm is March 10th 8:00-10:00 AM PST and the alternate final is May 12th 7:00-10:00 PM PDT. The primary purpose of the alternate exam is to accommodate different timezones, but students with documented conflicts and unique personal circumstances may also be approved to take the alternate exam.
Students will be allowed to submit regrade requests for the autograded and written portions of assignments in cases in which the rubric was incorrectly applied or the autograder scored their submission incorrectly. Regrades for the written portions of assignments will be handled through Gradescope, and autograder regrades via a Google Form.
Always check that the autograder executes correctly! Gradescope will show you the output of the public tests, and you should see the same results as you did on DataHub. If you see a discrepancy, ensure that you have exported the assignment correctly and, if there is still an issue, post on Piazza as soon as possible.
Regrade requests will not be considered in cases in which:
- a student uploads the incorrect file to the autograder
- the autograder fails to execute and the student does not notify the course staff before the assignment deadline
- a student fails to save their notebook before exporting and uploads an old version to the autograder
- a situation arises in which the course staff cannot ensure that the student’s work was done before the assignment deadline
- a students submits without following the steps outlined in @101 on Piazza
Undergraduate Grading Scheme (for students enrolled in Data C100):
|Homeworks||35%||12 + 1 bonus, with 2 drops|
|Labs||10%||13, with 3 drops|
|Section Assignments (small and large)||20%||12, with 3 drops|
Graduate Grading Scheme (for students enrolled in Data C200):
|Homeworks||45%||12 + 1 bonus, with 2 drops|
All assignments are due at 11:59 pm on the due date specified on the syllabus. Gradescope is where all assignments are submitted. Extensions are only provided to students with DSP accommodations, or in the case of exceptional circumstances. If these conditions apply, please make a private Piazza post and email firstname.lastname@example.org to request an extension Homeworks and labs will not be accepted late. Gradescope may allow you to make late submissions, but you will later be given a 0.
Collaboration Policy and Academic Dishonesty
Data science is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually in your own words. If you do discuss the assignments with others please include their names at the top of your notebook. Keep in mind that content from assignments will likely be covered on both the midterm and final.
If we suspect that you have submitted plagiarized work, we will call you in for a meeting. If we then determine that plagiarism has occurred, we reserve the right to give you a negative full score (-100%) or lower on the assignments in question, along with reporting your offense to the Center of Student Conduct.
Rather than copying someone else’s work, ask for help. You are not alone in this course! The entire staff is here to help you succeed. If you invest the time to learn the material and complete the assignments, you won’t need to copy any answers. (taken from 61A)
We also ask that you do not post your assignment solutions publicly.
Cheating on exams is a serious offense. We have methods of detecting cheating on exams – so don’t do it! Students caught cheating on any exam will fail this course. We will be following the EECS departmental policy on Academic Honesty, so be sure you are familiar with it.
We want you to succeed!
If you are feeling overwhelmed, visit our office hours and talk with us. We know college can be stressful – and especially so during the COVID-19 pandemic – and we want to help you succeed.