About: a faith-based nonprofit organization committed to serving language communities worldwide as they build capacity for sustainable language development.
Instructions: search for your topic, then use the filters to select "work type" to get data sets.
TDM Studio is the text and data mining interface for more than 200 licensed ProQuest content products, including government, archival, dissertation, and news databases. Content is available for analysis with R or Python in the workbench dashboard, or use the visualization dashboard to interact with and visualize content without any coding needed. Contact Stephanie Labou (slabou@ucsd.edu) for additional assistance.
To create an account:
1. Go to https://tdmstudio.proquest.com
2. Click “Create an account” button
3. Use your UCSD email address to create your account.
A practical step-by-step introduction to corpus linguistics.
Provides a practical and student-friendly guide to corpus linguistics that explains the nature of electronic data and how it can be collected and analyzed.
Surveys the breadth of corpus-based linguistic research on English, including chapters on collocations, phraseology, grammatical variation, historical change, and the description of registers and dialects.
The World Atlas of Language Structures is a book and CD combination displaying the structural properties of the world's languages.