Skip to Main Content

Text Data Mining Resources

Library resources that allow TDM

UH Libraries subscribes to the resources below for text and data mining. Each resource has its terms and conditions. Make sure you read carefully before you access the data. To suggest or request data not listed on this page, please submit this form.

BMJ Online

BMJ works with researchers to facilitate TDM projects. See terms and conditions.


JSTOR accommodates text analysis and digital humanities research by providing datasets for the journals, books, research reports, and pamphlets in the digital library. Request dataset here.


For subscribed journals and books, Springer Nature grants researchers text and data mining rights via their institutions, provided the purpose is non-commercial research. See more details on Springer Nature TDM policies.

Wiley Online Library

Academic subscribers can perform TDM under license on subscribed content for non-commercial purposes at no extra cost. See TDM agreement here with more details.

Openly available content for TDM

There are many resources available for free to researchers for text and data mining. Below are a few example resources that are a good starting point for gathering data to use for TDM. Boston College Libraries also created a list of freely available TDM resources.

HathiTrust Research Center

HathiTrust Research Center (HTRC) creates and maintains a suite of tools and services for text-based, data-driven research, such as HTRC Algorithms and Data Capsule, and engages in cutting-edge research on large-scale data analysis.

HathiTrust Data API

HathiTrust makes the text data of public domain works in its collection available to researchers to bulk download directly, for non-commercial research purposes.

New York Times APIs

New York Times Developers provide selective access to NYT article metadata, article search movie reviews, book reviews etc.