Skip to Main Content

Data Science: Working with Python

Getting started with Python

If you're a student in the Data Science major, you'll be learning Python through your coursework. The resources here are meant to supplement that learning, as well as provide avenues for you to pursue your more specific interests (e.g., machine learning, web scraping, etc.).

If you are not a Data Science student, these resources are still useful! Learning a programming language can help automate your research, whether you're working in biology, physics, social science, or some other domain. For those new to programming in general, the "Introductory Python tutorials" section is the place to start.

Download Python

First things first, you'll need to download Python, which is free. You can download Python by itself from the Python Software Foundation

Introductory Python tutorials

Advanced Python tutorials

We have quite a few advanced Python books available through the library. Some of these are only accessible via a physical book copy, but many are available as e-books. Try searching the library catalog UC Library Search for "python" to see our entire collection. 

In the meantime, these books may be useful.

Jupyter Notebook tutorials

Python Libraries

One of the main benefits of Python is the vast array of pre-existing packages (also called libraries), written by other Python users and available for installation. You can find Python packages on PyPI, the Python Package Index. 

This overview of popular Python libraries provides a starting point for finding applicable libraries. For more advanced users, this comprehensive list of packages by topic includes links to further resources. 

If using the Anaconda distribution of Python, many libraries come pre-installed. This tutorial covers the steps needed to install additional packages.

Here are some resources for popular data science Python libraries: 

 

Read here for an overview of some of the other data science-related packages, including wget (downloading files from browsers), FlashText (cleaning text for natural language processing), PyFlux (time series data), and more.