UH Libraries subscribes to the resources below for text and data mining. Each resource has its terms and conditions. Make sure you read carefully before you access the data. To suggest or request data not listed on this page, please submit this form.
BMJ works with researchers to facilitate TDM projects. See terms and conditions.
JSTOR accommodates text analysis and digital humanities research by providing datasets for the journals, books, research reports, and pamphlets in the digital library. Request dataset here.
For subscribed journals and books, Springer Nature grants researchers text and data mining rights via their institutions, provided the purpose is non-commercial research. See more details on Springer Nature TDM policies.
Academic subscribers can perform TDM under license on subscribed content for non-commercial purposes at no extra cost. See TDM agreement here with more details.
There are many resources available for free to researchers for text and data mining. Below are a few example resources that are a good starting point for gathering data to use for TDM. Boston College Libraries also created a list of freely available TDM resources.
HathiTrust Research Center (HTRC) creates and maintains a suite of tools and services for text-based, data-driven research, such as HTRC Algorithms and Data Capsule, and engages in cutting-edge research on large-scale data analysis.
HathiTrust makes the text data of public domain works in its collection available to researchers to bulk download directly, for non-commercial research purposes.
New York Times Developers provide selective access to NYT article metadata, article search movie reviews, book reviews etc.
If you have questions about access to data for conducting TDM related work, please contact Social Science Data Librarian, Emma Fontenot.