Most of our licensed databases usually
We can provide bulk downloads of the following ProQuest databases:
Most of our other licensed ProQuest content, including news databases, is available for analysis with R or Python through the ProQuest TDM Studio. Contact Data Science Librarian Stephanie Labou with any questions.
Other databases that support text analysis in some way:
TDM Studio is the text and data mining interface for more than 200 licensed ProQuest content products, including government, archival, dissertation, and news databases. Content is available for analysis with R or Python in the workbench dashboard, or use the visualization dashboard to interact with and visualize content without any coding needed. Contact Stephanie Labou (slabou@ucsd.edu) for additional assistance.
To create an account:
1. Go to https://tdmstudio.proquest.com
2. Click “Create an account” button
3. Use your UCSD email address to create your account.
Explore UCSD holdings from Gale Primary Sources using digital humanities text and data mining tools. No coding required! Rediscover and interpret the past through analysis and visualization of historical texts, including newspapers, books, archival collections, and more. (Create your personal DSL account online to begin selecting and analyzing materials. Be sure to be on VPN if off campus for account to work properly.) Learn more via tutorials and recorded webinars
The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and government research laboratories that hosts a repository devoted to acquiring, archiving, preserving and distributing linguistic corpora. These corpora are searchable via the LDC catalog. UC San Diego's membership allows UC San Diego students, faculty, and employees to register for a login that, once approved, provides free access to the datasets included with our membership years and a 50% discount on other datasets. Note that these datasets can only be used for educational, non-commercial text and data mining projects.
Includes a broad range of official and ephemeral information resources issued by federal agencies, individual officials and candidates, and other organizations from all branches of the U.S. Federal Government, and links that content to publicly accessible government documentation. Includes social media, official media releases, legislation, regulations, and a variety of government documents from Congress and the Executive branches. Textual data can be visualized in word clouds, tree maps, bubble graphs, and terms view graphs. Users who sign up for an account and agree to additional terms of service can download a small number of full documents; researchers and students with non-commercial, academic projects can apply with VoxGov for additional bulk data download credentials.
HathiTrust Research Center (HTRC) enables computational analysis (text and data mining) of works in the HathiTrust Digital Library (HTDL) to facilitate non-profit research and educational uses of the collection. HTRC creates and maintains a suite of tools and services for text-based, data-driven research, such as HTRC Algorithms and Data Capsule, and engages in cutting-edge research on large-scale data analysis. HTRC operates under a non-consumptive research paradigm: HTRC makes available the collection for computational analysis, while remaining clearly within the bounds of the fair use rights courts have recognized as applying to text analysis. The Center is committed to breaking new ground in the areas of non-consumptive text mining, allowing scholars to fully utilize content of the HathiTrust Digital Library.
Search across all of UC San Diego's Adam Matthew archival collections. We also have text and data mining access to our licensed Adam Matthew databases.
Collections include:
African American Communities
Age of Exploration
America in World War Two:Oral Histories and Personal Accounts
American History, 1493-1945
American Indian Histories and Cultures
American Indian Newspapers
American West
Apartheid South Africa,1948-1980
China, America and the Pacific
China: Culture and Society
China: Trade, Politics and Culture, 1793-1980
Church Missionary Society Periodicals
Colonial America
Colonial Caribbean
Confidential Print: Africa, 1834-1966
Confidential Print: Latin America, 1833-1969
Confidential Print: Middle East
Confidential Print: North America
Defining Gender
East India Company
Eighteenth Century Drama
Eighteenth Century Journals
Empire Online
Ethnomusicology: Global Field Recordings
Everyday Life and Women in America
First World War Portal
Food and Drink in History
Foreign Office Files China 1919-1980
Foreign Office Files India, Pakistan and Afghanistan,1947-1980
Foreign Office Files Japan, 1919-1952
Foreign Office Files Middle East, 1971-1981
Foreign Office Files South East Asia, 1963-1980
Frontier Life
Gender: Identity and Social Change
Global Commodities
India, Raj and Empire
Interwar Culture
J. Walter Thompson:Advertising America
Jewish Life in America
Leisure Travel and Mass Culture
Life at Sea: Seafaring in the Anglo-American Maritime World, 1600-1900
Literary Manuscripts Berg
Literary Manuscripts Leeds
Literary Print Culture
London Low Life
Macmillan Cabinet Papers, 1957-1963
Market Research and American Business, 1935-1965
Mass Observation Online
Medical Services and Warfare
Medieval Family Life
Medieval Travel Writing
Meiji Japan
Migration to New Worlds
Perdita Manuscripts, 1500-1700
Popular Culture in Britain and America, 1950-1975
Popular Medicine in America, 1800-1900
Race Relations in America
Romanticism: Life, Literature and Landscape
Service Newspapers of World War Two
Sex and Sexuality
Shakespeare in Performance
Shakespeare's Globe Archive
Slavery, Abolition and Social Justice
Socialism on Film
The Grand Tour
The Nixon Years, 1969-1974
Trade Catalogues and the American Home
Travel Writing, Spectacle and World History
Victorian Popular Culture
Victorians on Film. Entertainment, Innovation & Everyday Life
Virginia Company Archives
Women in the National Archives (UK)
World's Fairs
JSTOR and Portico are building a text and data mining (TDM) platform aimed at teaching and enabling a generation of researchers to text mine. The platform includes a user interface to allow researchers, students, and instructors to curate, visualize, and save custom datasets. Researchers may download the extracted features of their curated datasets. Extracted features are a non-consumptive “bag-of-words” where each article or book chapter in the custom dataset is represented with bibliographic metadata, the unique set of words on each page, and the number of times the word occurs on the page. The dataset includes journals, books, and newspapers from JSTOR , Portico, and Chronicling America
Please email The Library with questions about any specific resource or database.