About Data 100
Combining data, computation, and inferential thinking, data science is redefining how people and organizations solve challenging problems and understand their world. This intermediate level class bridges between Data 8 and upper division computer science and statistics courses as well as methods courses in other fields. In this class, we explore key areas of data science including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making. Through a strong emphasis on data centric computing, quantitative critical thinking, and exploratory data analysis, this class covers key principles and techniques of data science. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.
- Prepare students for advanced Berkeley courses in data-management, machine learning, and statistics, by providing the necessary foundation and context
- Enable students to start careers as data scientists by providing experience working with real-world data, tools, and techniques
- Empower students to apply computational and inferential thinking to address real-world problems
While we are working to make this class widely accessible, we currently require the following (or equivalent) prerequisites. We are not enforcing prerequisites during enrollment. However, all of the prerequisties will be used starting very early on in the class. It is your responsibility to know the material in the prerequisites.
- Foundations of Data Science: Data8 covers much of the material in Data 100 but at an introductory level. Data8 provides basic exposure to python programming and working with tabular data as well as visualization, statistics, and machine learning.
- Computing: The Structure and Interpretation of Computer Programs (CS 61A) or Computational Structures in Data Science (CS 88). These courses provide additional background in python programming (e.g., for loops, lambdas, debugging, and complexity) that will enable Data 100 to focus more on the concepts in Data Science and less on the details of programming in python.
- Math: Linear Algebra (Math 54, EE 16a, or Stat89a): We will need some basic concepts like linear operators, eigenvectors, derivatives, and integrals to enable statistical inference and derive new prediction algorithms. This may be satisfied concurrently to Data 100.
This summer, Data 100 will be run entirely online. This section details exactly how each component of the course will operate. But here’s a nice high-level “typical week in the course”:
|Homework A released
|Homework B released
|Homework B due
|Homework A due
|[SUNDAY] Labs A, B released
|[SATURDAY] Labs A, B due
Note that these deadlines are subject to change.
- To see when any live events are scheduled, check the Calendar.
- To see when lectures, discussions, and assignments are released (and due), check the Home Page.
This course has discussion sections on Mondays and Wednesdays, lasting for one hour each. There will also be some sections on Tuesdays and Thursdays. The goal of these sessions is to work through problems, hone your skills, and flesh out your understanding as part of a team. The problems that you solve and present as part of discussion are important in understanding this material.
To encourage attendance and participation in live discussion, we will offer the option of having discussion contribute to your grade. Specifically, points you earn from attending/participating in discussion can reduce the weighting of exams on your overall course grade. See the grading breakdown below for details.
- There are 4 lectures per week.
- Lectures will be entirely pre-recorded, in a format that is optimized for online learning (short 5-10 minute videos with optional conceptual problems in between). Lecture videos will be released on the mornings of Monday, Tuesday, Wednesday and Thursday at 9:40 AM PT.
- Many of these will be from previous semesters, but some will be recorded this summer by the instructors.
- Lecture videos will be posted on YouTube. Each “lecture” will be an html page linked on the course website, containing videos and links to slides and code.
- Each lecture will also have a Piazza thread for students to ask questions.
Homeworks are assignments that are designed to help students develop an in-depth understanding of both the theoretical and practical aspects of ideas presented in lecture.
- In a typical week, there will be two homeworks. The first will be released Monday morning, by 9:40 AM PT, and will be due Thursday at 11:59 PM PT. The second homework will be released Thursday morning at 9:40 AM PT, and will be due the following Monday at 11:59 PM PT. There may be some deviation in this schedule to account for holidays, exams, etc.
- Most homeworks will be Jupyter notebooks; one or two homeworks will be on-paper written assignments.
- Homeworks have both visible and hidden autograder tests. The visible tests are mainly sanity checks, e.g. a probability is <= 1, and are visible to students while they do the assignment. The hidden tests generally check for correctness, and are invisible to students while they are doing the assignment.
- The primary form of support students will have for homeworks and projects are the office hours we’ll host, and Piazza.
- Homeworks must be completed individually.
Labs are shorter programming assignments designed to give students familiarity with new ideas.
- In a typical week, there will be two labs, often covering content taught in lecture the same week. Both labs will be released simultaneously on Sunday morning by 9:40 AM PT, and they will both be due on Saturday at 11:59 PM PT.
- All lab autograder tests are visible.
- In previous semesters, we held live lab sections; this semester, we will not be holding live sections for labs. Instead, the labs have been condensed and simplified, and will include a video walk-through to assist students in completing the assignment.
- Students can get help with labs at office hours and on Piazza.
- We plan on hosting roughly 10 hours of office hours each weekday. These hours are listed on the Calendar.
- OH will serve as a one-stop shop for students to get help with assignments.
- Office Hours can be accessed via Gather where students can collaborate with other students while waiting to receive help from course staff. More details in @257 on Piazza.
- The instructors will also be hosting conceptual office hours. These will be reflected on the Calendar.
There will be one midterm exam, on July 15th (9:30 AM - 11:00 AM PT), and a final exam on August 12th (9:30 AM - 12:30 PM PT).
Alternate exam times will be offered for the midterm and final, and the form to request the alternate will be posted on Piazza soon after the start of class. The alternate midterm is July 15th from 8:00 PM - 9:30 PM PT and the alternate final is August 12th from 8:00 PM - 11:00 PM PT. The primary purpose of the alternate exam is to accommodate students in different timezones, but students with documented conflicts and unique personal circumstances may also be approved to take the alternate exam.
Students will be allowed to submit regrade requests for the autograded and written portions of assignments in cases in which the rubric was incorrectly applied or the autograder scored their submission incorrectly. Regrades for the written portions of assignments will be handled through Gradescope, and autograder regrades via a Google Form.
Always check that the autograder executes correctly! Gradescope will show you the output of the public tests, and you should see the same results as you did on DataHub. If you see a discrepancy, ensure that you have exported the assignment correctly and, if there is still an issue, post on Piazza as soon as possible.
Regrade requests will not be considered in cases in which:
- a student uploads the incorrect file to the autograder
- the autograder fails to execute and the student does not notify the course staff before the assignment deadline
- a student fails to save their notebook before exporting and uploads an old version to the autograder
- a situation arises in which the course staff cannot ensure that the student’s work was done before the assignment deadline
- a students submits without following the steps outlined in @11
|12, with 2 drops
|13, with 2 drops
The remaining 50% of your grade will be the maximum of two scores. You do not need to explicitly select one or the other—we will automatically determine the maximum for you.
|13, with 2 drops
Your discussion score is the average of your scores for each individual discussion. Each discussion will be graded on a 0/1 basis. As of now, that one point will be determined based on attendance—if you attend a discussion section, you will receive the point for that discussion section. However, we reserve the right to increase the threshold to earn this point, for example, by requiring some form of participation. Note that cameras are encouraged, but are NOT required in discussion section.
All assignments are due at 11:59 pm on the due date specified on the syllabus. Gradescope is where all assignments are submitted. Homeworks and labs will not be accepted late. Gradescope may allow you to make late submissions, but you will later be given a 0.
Extensions are only provided to students with DSP accommodations, or in the case of exceptional circumstances. If these conditions apply, please make a private Piazza post to request an extension. When posting, please put [EXTENSION REQUEST] in the title, and select the “extension” folder. Do not email the instructors with extension requests. If you make a request close to the deadline, we can not guarantee that you will receive a response before the deadline. Additionally, simply submitting a request does not guarantee you will receive an extension. Even if your work is incomplete, please submit before the deadline so you can receive credit for the work you did complete.
Note that extension requests will not be granted in cases where a student’s local (DataHub) tests are not passing. It is the student’s responsibility to solve such problems in advance of the deadline.
Collaboration Policy and Academic Dishonesty
Data science is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually in your own words. If you do discuss the assignments with others please include their names at the top of your notebook. Keep in mind that content from assignments will likely be covered on both the midterm and final.
If we suspect that you have submitted plagiarized work, we will call you in for a meeting. If we then determine that plagiarism has occurred, we reserve the right to give you a negative full score (-100%) or lower on the assignments in question, along with reporting your offense to the Center of Student Conduct.
Rather than copying someone else’s work, ask for help. You are not alone in this course! The entire staff is here to help you succeed. If you invest the time to learn the material and complete the assignments, you won’t need to copy any answers. (taken from 61A)
We also ask that you do not post your assignment solutions publicly.
Cheating on exams is a serious offense. We have methods of detecting cheating on exams – so don’t do it! Students caught cheating on any exam will fail this course. We will be following the EECS departmental policy on Academic Honesty, so be sure you are familiar with it.
We want you to succeed!
If you are feeling overwhelmed, visit our office hours and talk with us. We know college can be stressful – and especially so during the COVID-19 pandemic – and we want to help you succeed.