Data and scripts from: Rubenstein Library card catalog


  • This data includes the dataset, code, and files used and created by the Duke University Data+ 2021 Rubenstein Library Card Catalog Team. Working with the digitized cards from the David M. Rubenstein Rare Book and Manuscript Library's physical card catalogs, our team explored the files as a way to further the library's initiative of finding and describing historically marginalized voices in their collections.

    We created a structured dataset using natural language processing and some manual editing, sorted by collection of items within the catalog and containing important metadata such as author, location, and date written from the OCRed text of the scanned cards. With the dataset we created, we analyzed what and who is present in these cards, and displayed these findings in Jupyter Notebook files. We explored the demographics of the authors and items cataloged, as well as analyzed how the information within the cards relates to the history of Duke University and delved into the common topics of the data. We completed spatial frequency mapping on the level of the United States and of North Carolina counties, in addition to visualizing the international countries present in the cards. There is copious rich information present in the files, and our Data+ project is just the tip of the iceberg. We hope that future research teams will continue to dissect the card files to gain insights into Duke's history and the contents of the library's collections. ... [Read More]

Total Size
13 files (111 MB)
Data Citation
  • 10.7924/r4br8v905
Publication Date
  • ark:/87924/r4br8v905
  • Durham
Funding Agency
  • Duke Rhodes Information Initiative
  • Data and scripts from: Rubenstein Library card catalog
This Dataset
Usage Stats