ECCO-TCP resulted from a a partnership with Gale, part of Cengage Learning, to produce highly accurate, fully-searchable, SGML/XML-encoded texts from among the 150,000 titles available in Gale’s Eighteenth Century Collections Online (ECCO) database.
Eighteenth Century Collections Online
Eighteenth Century Collections Online includes significant English-language and foreign-language title printed in the United Kingdom during the 18th century, along with thousands of important works from the Americas. The database contains more than 32 million pages of text and more than 205,000 individual volumes in all. In addition, ECCO natively supports OCR-based full-text searching of this corpus. This is significant because it meant that unlike EEBO-TCP (which produced searchable text where there was previously none at all), ECCO-TCP could only hope to produce more accurate text (and more reusable text) than what was already available. The larger size of ECCO (because of the great increase in printing and greatly enhanced chances of survival of printed works in the 18th century) also made it a different proposition: nothing so ambitious as EEBO-TCP coverage was feasible for ECCO-TCP.
Creating full-text transcriptions
Because of these greater challenges facing ECCO-TCP, it is perhaps better described as a proof of concept than as a completed project. With the support of more than 35 libraries, the TCP keyed, encoded, edited, and released 2,231 ECCO-TCP texts. A further tranche of texts were keyed and encoded but never fully proofed or edited. They remain useful for many purposes, however, and bring the total of ECCO-TCP texts to roughly 3,000. In cooperation with Gale Cengage, these texts have been made freely available to the public. To users working with the EEBO-TCP texts, the ECCO-TCP texts may form a useful adjunct, since for the latter some attempt was made to select works by authors who straddled the divide between the 17th and 18th centuries, the thought being that authors whose earlier works we had included in our 17th-century corpus could be “completed” by having their later works included in our 18th-century (ECCO-TCP) corpus. That helps account (for example) for the heavy representation of Defoe in ECCO-TCP.
Because there are no longer any restrictions on how the ECCO-TCP texts may be used and shared, many users have already begun to make this data and metadata available in various forms and formats around the Web:
- Search, browse, or read from the ECCO-TCP corpus through the University of Michigan Library’s digital collections platform (Original ECCO-TCP partners will also be able to link from the full-text view to the corresponding ECCO page images)
- Search the ECCO-TCP corpus and view results via ARTFL‘s PhiloLogic search engine (Thanks to Robert Morrissey)
- Download the original SGML/XML encoded texts and headers from the TCP (encoded originally using a customization of TEI P3, but available by preference in an XML version with bibliographic headers)
- Download TEI P5 XML, EPUB, plain text, or HTML for each text from the Oxford Text Archive (Thanks to Sebastian Rahtz)
- (Consult the editors to use) the Corpus of Late Modern English Medical texts at Helsinki, based on ECCO-TCP texts requested by researchers at Helsinki.
The ECCO-TCP files were heavily used, and hosted, by 18thConnect during their experiments with improving OCR for historical type faces, the “emop” project, using our files as their “ground truth.” This illustrates one purpose to which ECCO-TCP texts may be put. In fact, TCP texts remain full-text searchable in 18thConnect (when results appear in a list, there is a notice that reads, “Text provided by the TCP”), and they hope also to substitute the TCP texts for OCR in their crowd-sourced correction engine, TypeWright , by early in 2020.
We hope that at least one of these options will meet your research needs, but we also welcome your questions, suggestions, and requests for alternatives (or, we hope you’ll build something that works for you, and let us know about it!).