We are a consortium of (mostly) university and college libraries that have joined together to create standardized, accurate, and faithful XML/SGML-encoded electronic text editions of early printed books. We’ve transcribed and marked up text — through manual keying, rather than optical character recognition (OCR) — from millions of static page images in ProQuest’s Early English Books Online, Gale Cengage’s Eighteenth Century Collections Online, and Readex’s Evans Early American Imprints.
To date, the project has created more than 70,000 transcribed and encoded historical texts. Its scope and scale is unprecedented among digitization and text-encoding projects of its kind.
A public-private partnership, led by libraries
This project represents a unique partnership between public and private sector institutions. Many partner libraries joined to ensure that early modern book content, which is in the public domain, remain perpetually accessible to scholars and the wider public regardless of academic affiliation.
Our policies were imbued with a librarian’s attitude toward content: a resolve to prepare materials without agenda or bias, and with a view toward wide use and reuse.
Through our partnership with private vendors, we had access to a huge trove of images from which to transcribe. In return, these companies were supplied with a full-text index to their images —work which would have otherwise been difficult or expensive to produce.
Sharing our mission
Our work was jointly funded and is owned by more than 150 libraries worldwide. These libraries own the transcriptions and are committed to making it publicly available.
Full text can be searched using web interfaces provided by the University of Michigan Library (and, for internal use, at the University of Oxford). Subsets and modified versions of the files have also been hosted by other universities. Users in the United Kingdom can access the entire corpus through JISC’s Historical Texts portal.
Any institution or individual is free to host the publicly released portion of the texts in any system or interface they choose.
Commitment to the scholarly community
We’re mindful of the long-term needs of libraries, scholars, and the larger community. We maintain a commitment to the quality and cost-effectiveness of our content, and are guided by underlying principles that:
- Convey robust rights of use to scholars;
- Protect the public domain rights of the larger society to access out-of-copyright materials;
- Present the user with accurately keyed, modern-font texts that are faithful to the spellings and organization of the original works;
- Ensure that this content will migrate forward through shifts in technology, to represent editions of enduring value to libraries.
The net effect of our initiatives has been to maximize the respective strengths of commercial and academic digital library development, for the long-term benefit of researchers and students. It benefits the library community by :
- Entrusting conversion of important but difficult works to the university community, supporting appropriate scholarly review and intervention;
- Drawing upon community expertise to develop the scope and standards underlying such projects;
- Carrying forward the work in a cost-effective manner by distributing the costs across many academic institutions, as well as encouraging substantial contributions from commercial partners;
- Ensuring that partner libraries co-own the resulting text file, with robust rights to manage, reuse, and distribute as they see fit — including the right to distribute texts beyond their authenticated users, to other partner institutions.
Use and access
Arrangements between the Text Creation Partnership, its partner libraries, and its corporate partners have differed slightly between projects, but the overall structure in every case has given partner institutions, their students, and faculty the right to store, host, distribute, share, manipulate, alter, analyze, and otherwise work with the content from the moment of its creation. The same arrangements always provided for a “window of exclusivity” during which partner institutions are obliged to restrict any distribution to other partner institutions, followed by a removal of all restrictions, perpetual ownership, and perpetual public access.
Three of our four projects (Evans, ECCO, and EEBO Phase 1) have concluded their period of exclusivity, and those texts are now free from all licensing or copyright restrictions. The fourth project, EEBO Phase 2, remains in its exclusive phase through 2021, and those texts are therefore subject to Licensing and Access terms.
Our project began in 1999 as an experimental partnership among the university libraries of Michigan and Oxford, ProQuest, and the Council on Library and Information Resources (CLIR). The goal of the project was to produce standardized, digitally-encoded electronic text editions of 25,000 titles from ProQuest’s Early English Books Online.
A working group developed an SGML DTD derived from TEI P3, the text encoding standard at that time, influenced by variants of TEI-Lite current among many library-based digitization shops. Staff at U-M Library developed a set of capture- and encoding-instructions using this DTD. Data-conversion vendors were asked to submit bids for keying and markup using these instructions. And thus work began: Texts were selected each month at Michigan, page-images were supplied by ProQuest, marked-up transcriptions were submitted by the vendors, and quality control and editing undertaken at U-M Library and soon also at Bodleian Libraries in Oxford and subsidiary sites at the National Library of Wales, Aberystwyth, and at the University of Toronto.
During its course, TCP employed nearly a hundred editors and immediate production-related staff in Ann Arbor, Oxford, Aberystwyth, and Toronto, as well as dozens more in other roles, supporters willing to serve on the executive board and ad hoc working groups, and hundreds of keyers, editors, quality-control specialists, encoders, and managers at four data-conversion firms (Apex, SPi, Aptara, and AELData).
EEBO-TCP met its goal of producing 25,000 books in 2009 (thereafter known as “EEBO-TCP Phase 1”), and then undertook work on a second phase to convert the first edition of each remaining unique monographic work in EEBO—another 40,000 or so books, for a total of around 70,000, if all hopes were realized.
In 2005, the TCP executive board and staff sought to expand the TCP model to other databases of historical books, namely, Gale Cengage’s Eighteenth-Century Collections Online (ECCO) and Newsbank Readex’s Evans Early American Imprints (Evans-TCP). These projects never received quite the support attracted by EEBO-TCP, and in the end produced only about 8,000 texts, compared to the 60,000 produced by the latter, with another few thousand on the way.
Though the TCP was almost entirely self-funded — funded by its members — it also was the recipient of an NEH grant. Under this grant, a subset of texts related to travel and navigation were identified and converted to the same standards as other works. The resultant collection, EEBO-TCP Collections: Navigations was made possible by a Humanities Collections and Reference Resources grant from the National Endowment for the Humanities (NEH) Division of Preservation and Access. The project ran from May 1, 2013-October 31, 2015. Though as a matter of timing this tranche of texts belongs to EEBO Phase 2, as a federally funded project the Navigations texts are free from the Phase 2 restrictions and open for public use.