Skip to main content

Finding Data & Statistics: Text Corpora

A note about UCSD-licensed databases

Most of our licensed databases usually

  • limit the number of citations or articles that can be downloaded at once
  • prohibit systematic downloading (downloading of substantial collections)
  • prohibit automated downloading (using of scripts)
  • prohibit datamining directly on the vendor's servers
  • prohibit the redistribution of content (including cleaned data)

We can provide bulk downloads of the following ProQuest databases:

  • Congressional Record (part A)
  • History Vault: Vietnam War collection
  • History Vault: Immigration collection (part 1)
  • Chicago Tribune, 1849-1933
  • Los Angeles Times, 1881-1933
  • New York Times, 1851-1937
  • Wall Street Journal, 1889-1935
  • Washington Post, 1877-1935
  • San Francisco Chronicle, 1865-1922
  • American Periodicals Series
  • Periodicals Archive Online (series 1-5) 

We also have API access to our licensed Adam Matthew databases.

LLMC Digital has been willing to provide bulk downloads of historic legal and government documents to affiliated researchers who contact them directly in the past.

Please contact Annelise Sklar with questions about any specific resource or database.

Freely available corpora & bulk data

Tutorials