It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Data for Research is a free service for researchers wishing to analyze content on JSTOR through a variety of lenses and perspectives. If you require more than 1,000 documents or a type of data not available through the interactive portion of the site, please contact us at: firstname.lastname@example.org
Download 440 million words of full-text data for COCA, or 1.8 billion words for GloWbE. With this data, you will have the corpora on your computer, rather than having to use the web interface. The data comes in three formats: tables for relational databases, word/lemma/PoS (vertical format), or text (linear format).
- ICWSM 2011 Spinn3r Dataset
That dataset, provided by Spinn3r.com, is a continuation of the 2009 Spinn3r Dataset. The dataset consists of over 386 million blog posts, news articles, classifieds, forum posts and social media content between January 13th and February 14th.
- ICWSM 2009 Spinn3r Blog Dataset
The dataset, provided by Spinn3r.com, is a set of 44 million blog posts made between August 1st and October 1st, 2008.
- JDPA Sentiment Corpus
The JDPA Corpus consists of user-generated content (blog posts) containing opinions about automobiles and digital cameras. They have been manually annotated for named, nominal, and pronominal mentions of entities. Entities are marked with the aggregate sentiment expressed toward them in the document. Note that these datasets are free but researchers will need to contact the ICWSM and sign a usage agreement to be granted access.