Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Linguistics: Data & Corpora

NOTE: We now have a membership to Linguistic Data Consortium (LDC).  If you'd like a login, sign up for an account at the link below and you will be added as a user with access to our paid-for datasets. Contact Tamara Rhodes at tlrhodes@ucsd.edu if you have any questions.

Linguistics Data & Corpora

About: a faith-based nonprofit organization committed to serving language communities worldwide as they build capacity for sustainable language development.

Instructions: search for your topic, then use the filters to select "work type" to get data sets.

Tools & Software

An open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython.

Resources

Resources

Doing Corpus Linguistics

A practical step-by-step introduction to corpus linguistics.

Practical Corpus Linguistics

Provides a practical and student-friendly guide to corpus linguistics that explains the nature of electronic data and how it can be collected and analyzed.

The Cambridge Handbook of English Corpus Linguistics

Surveys the breadth of corpus-based linguistic research on English, including chapters on collocations, phraseology, grammatical variation, historical change, and the description of registers and dialects.

The World Atlas of Language Structures

The World Atlas of Language Structures is a book and CD combination displaying the structural properties of the world's languages.

Feedback

Is this page useful?
Absolutely!: 0 votes (0%)
Yes: 0 votes (0%)
Sort of: 0 votes (0%)
No: 0 votes (0%)
Absolutely not!: 0 votes (0%)
Total Votes: 0