Skip to Main Content

Data Science: Guide for Independent Projects

Introduction

Many data science students eventually want to undertake an independent or personal side project. This guide is intended to provide resources for these types of project. This is not necessarily intended to provide guidance for course projects, internship deliverables, or other formalized projects. Rather, this is to help you, as a data science student, get a little extra experience working with data. 

The benefits of these types of projects are three-fold: (1) apply what you've learned in your coursework to a new topic, testing your knowledge (2) learn new skills, including new Python or R packages and other platforms/tools, and (3) produce an output you can put on your resume. If you get really into your project, you can also consider turning it into a guest blog post on a data science site, or otherwise sharing your work with a broader audience.

Getting started

 

Still not sure? Use the choice wheels below to help brainstorm a project topic.

Guided projects

Maybe you're not ready to start a project entirely from scratch. That's fine! These links have examples of more guided projects: they provide a dataset, a general question, and either tutorials, full source code, or hints about what packages and analyses you'll need to use. Think of these are "training wheels projects": they are a way to build your confidence and help you get comfortable with outside class projects.

Starting projects from scratch

Make use of the other resources in this guide! Check out the "Working with Python" and "Working with R" tabs for information about data analysis and visualization packages. Read through the "Version Control & GitHub" tab for additional information about working with Git and how to properly structure a GitHub repository. The "Finding Data & Statistics" tab redirects to a full guide to help with finding data sources and the "Data Visualization" tab will send you to additional resources about data visualization, including best practices.

Project examples for beginners

Sometimes, you want to look at fully formed examples to get an idea of what you can do for your own project. Here are some examples of data science (or at least, data science-ish) projects suitable for lower division data science students: the projects use available data, (mostly) make the underlying code public, produce effective/interesting visuals, and are easy to read through. These examples also span a range of project options, such as making a tutorial for popular/frequently used datasets, learning new techniques, scraping your own data, or digging into a big dataset.

 

Also consider reaching out to your fellow data science students about forming a group to work on an independent project. Group projects are a great way to develop important skills such as code collaboration (particularly using GitHub) and project workflow management. Working with a group also provides a built-in network for brainstorming ideas, troubleshooting code errors, and formalizing your project. Plus, it can be more motivating to work in a group, since you're relying on each other to make progress.

Alternatively, if you prefer to work on your own project, it would still be valuable to reach out to other people for code review. Reviewing someone else's code is a useful learning exercise, and having your own code reviewed by your peers is a good way to make sure you don't have any mistakes in your code. 

More advanced projects

 

If you have an advanced project idea, look into whether it could be a good fit for the HDSI Undergraduate Scholarship Program. About the program:

"Unlike lab-directed projects, students will be able to choose their own research topics and lead the research process. Scholarships will provide opportunities for students to work closely with a mentor to develop analytical skills, develop data science portfolios, and foster novel data-driven approaches to problem solving.

Examples of data-driven projects include applications of methods, tools, and infrastructure for heterogeneous dataset integration, machine learning, geospatial analyses, scalable computing, data visualization, data ethics, and privacy. Priority will be given to applications that employ novel and creative data scientific approaches with specific potential impact to application areas."

Building a portfolio

When working on a personal project, you are building your data science portfolio, a public collection of your work you can share with future employers. A good data science portfolio will include a mix of code, data visualizations, and narrative. 

Having a well-organized GitHub is a great start to building your data science portfolio. Remember: most repositories on GitHub are public, so you can look at other people's data science portfolios and projects to get a sense of style and format. You may also eventually decide to create your own website. The format of your portfolio may vary; the important thing to keep in mind is that this is a way to showcase your work for future employers.