Skip to Main Content

Data Science: Guide for Independent Projects

Introduction

Many data science students eventually want to undertake an independent or personal side project. This guide is intended to provide resources for these types of project. This is not necessarily intended to provide guidance for course projects, internship deliverables, or other formalized projects. Rather, this is to help you, as a data science student, get a little extra experience working with data. 

The benefits of these types of projects are three-fold: (1) apply what you've learned in your coursework to a new topic, testing your knowledge (2) learn new skills, including new Python or R packages and other platforms/tools, and (3) produce an output you can put on your resume. If you get really into your project, you can also consider turning it into a guest blog post on a data science site, or otherwise sharing your work with a broader audience.

Getting started

 

Still not sure? Use the choice wheels below to help brainstorm a project topic.

Guided projects

Maybe you're not ready to start a project entirely from scratch. That's fine! These links have examples of more guided projects: they provide a dataset, a general question, and either tutorials or hints about what packages and analyses you'll need to use. Think of these are "training wheels projects": they are a way to build your confidence and help you get comfortable with outside class projects.

Starting projects from scratch

Make use of the other resources in this guide! Check out the "Working with Python" and "Working with R" tabs for information about data analysis and visualization packages. Read through the "Version Control & GitHub" tab for additional information about working with Git and how to properly structure a GitHub repository. The "Finding Data & Statistics" tab redirects to a full guide to help with finding data sources and the "Data Visualization" tab will send you to additional resources about data visualization, including best practices.

Project examples for beginners

Sometimes, you want to look at fully formed examples to get an idea of what you can do for your own project. Here are some examples of data science (or at least, data science-ish) projects suitable for lower division data science students: the projects use available data, (mostly) make the underlying code public, produce effective/interesting visuals, and are easy to read through. These examples also span a range of project options, such as making a tutorial for popular/frequently used datasets, learning new techniques, scraping your own data, or digging into a big dataset.

 

Also consider reaching out to your fellow data science students about forming a group to work on an independent project. Group projects are a great way to develop important skills such as code collaboration (particularly using GitHub) and project workflow management. Working with a group also provides a built-in network for brainstorming ideas, troubleshooting code errors, and formalizing your project. Plus, it can be more motivating to work in a group, since you're relying on each other to make progress.

Alternatively, if you prefer to work on your own project, it would still be valuable to reach out to other people for code review. Reviewing someone else's code is a useful learning exercise, and having your own code reviewed by your peers is a good way to make sure you don't have any mistakes in your code. 

More advanced projects

Portfolio examples

When working on a personal project, you are building your data science portfolio, a public collection of your work you can share with future employers.

Having a well-organized GitHub with each project in its own repository is a great start to building your data science portfolio. You may eventually decide to create your own website. The format of your portfolio may vary; the important thing to keep in mind is that this is a way to showcase your work.

For an in-depth guide to developing your data science portfolio, check out this site from UC Davis DataLab.