Graduate Final Project


Anyone can choose to complete the final project. Students enrolled in Stat C200C or CS C200A, the graduate version of the course, are required to complete the final project. For students enrolled in Stat C100, the final project is optional but allows for an alternate grading option. See the grading page for details. This final project is additional to the required course projects released during the semester.

Due Date: Your final project will be due during RRR week. The exact deadline is TBD – updates will be posted on Piazza as they become available.

Checkpoint: You must fill out this form describing your project by 11:59pm Friday, November 22nd ( login required). Only one person per partner group needs to fill out the form. As mentioned below, if doing Option 1, you will also need to meet with a professor or GSI in order to have your project idea approved before submitting this form.

Project Report:

Option 1 (Design a Project): Your project submission should be a single notebook that has the format of a research paper. It should include a title, list authors, abstract, introduction, description of data, description of methods, summary of results, and discussion. The notebook should also includes all code and visualizations. Make sure to number figures and tables and include informative captions.

Option 2 (Image Classification): Your project submission should be according to the guidelines of the specification below.

Partners: You must partner with one other classmate to complete the project. If you choose Option 1 and would like to have a team of 3 or 4, speak with a GSI or instructor about your justification for a larger team during your project proposal meeting with them.

Presentations: The Data 100/200 Project Fair will be held sometime during RRR week. Presenting your project is required for both options. If you can’t make the fair time, we will provide the opportunity to present your project at an alternate time.

Scoring: Your project will be scored based on the submitted report. If you present your project, the person who will score your project will also attend your presentation for additional context.

Project Choices

There are two options for the final project: pick your own question and data set or follow the recommendations we have provided.

Option 1: Design a Project

The purpose of this project is to carry through a data science workflow and put into practice what you have learned in this course in a more open-ended setting than the assignments. Specifically, the project should involve the following steps.

  1. Frame a question of your choice that can be addressed by identifying, collecting, and analyzing relevant data.
  2. Describe and obtain the data.
  3. Perform exploratory data analysis (EDA) and include in your report at least two (but probably many more) data visualizations.
  4. Describe any data cleaning or transformations that you perform and why they are motivated by your EDA.
  5. Apply relevant inference or prediction methods (e.g., linear regression, logistic regression, or classification and regression trees), including, if appropriate, feature engineering and regularization. Use cross-validation or test data as appropriate for model selection and evaluation. Make sure to carefully describe the methods you are using and why they are appropriate for the question to be answered.
  6. Summarize and interpret your results (including visualization). Provide an evaluation of your approach and discuss any limitations of the methods you used.
  7. Describe any surprising discoveries that you made and future work.

In order to ensure that you have applied the course materials in sufficient scope, we impose the following two additional requirements.

  • The analysis should involve at least one of the inference or prediction methods presented in this course.
  • The dataset should have at least six distinct variables (i.e., columns) and a sample size (i.e., rows) of 50 or more. Much larger datasets are encouraged. Smaller datasets must be approved by the instructors via e-mail.

In order to do Option 1, you must have your project proposal approved by a professor or GSI (that’s able to approve proposals). You must do this in person, and before the November 22nd deadline of submitting the checkpoint form.

Option 2: Image Classification

Download this ZIP archive and read the specification to complete the pre-defined final project about image classification.