Early English Books Online (EEBO) TCP

About EEBO–TCP

EEBO-TCP is a partnership with ProQuest and with more than 150 libraries to generate highly accurate, fully-searchable, SGML/XML-encoded texts corresponding to books from the Early English Books Online (EEBO) Database.

Early English Books Online

The EEBO TCP corpus consists of the works represented in the Early English Books Online collections known as Short Title Catalogues I and II (based on the Pollard & Redgrave and Wing short title catalogs respectively), as well as the Thomason Tracts and the Early English Books Tract Supplement collections. Together these trace the history of English thought from the first book printed in English in 1475 through to 1700. The books in these collections include works of literature, philosophy, politics, religion, geography, history, politics, mathematics, music, the practical arts, natural science, and all other areas of human endeavor. The assembled collection of more than 125,000 volumes is a mainstay for understanding the development of Western culture in general and the Anglo-American world in particular. The STC collections have perhaps been most widely used by scholars of English, theology, linguistics, and history, but these resources also include core texts in art, women’s studies, history of science and medicine, law, and music.

The following are but a small sampling of the authors whose works are included: Erasmus, Shakespeare, King James I, Marlowe, Galileo, Caxton, Chaucer, Malory, Boyle, Newton, Locke, More, Milton, Spenser, Bacon, Donne, Hobbes, Purcell, Behn, and Defoe.

Creating full-text transcriptions

Phase I

From 2000-2009, Phase I successfully converted 25,000+ selected texts from the Early English Books Online corpus. The 25,368 texts in EEBO-TCP Phase I, initially available only to institutions that contributed to their creation, were released to the general public on January 1, 2015, and are therefore currently available for access, distribution, use, or reuse by anyone.

Phase II

Begun in 2009, Phase II both shrank and expanded the scope of EEBO TCP.  Selection became more discriminating and focused more on English-language (and Welsh- and Gaelic-language) texts to the exclusion of French and Latin titles, and also set aside the serials (periodicals) as a fit project for another time. But within the constraints of English-language monographic titles, it aspired to something approaching comprehensive treatment: EEBO Phase II planned to convert each and every unique work in Early English Books Online (usually the first edition), or an estimated total of around 45,000 books on top of the 25,000 completed in Phase I. This was an ambitious, and always risky, goal. As it happened, enough institutions joined Phase II to fund the completion of about 40,000 titles, of which about 35,000 have been released to date, the remainder slowly working their way through the production pipeline.

As of 2019, the total number of books available in Phase II came to 34,963, with a further release of several thousand additional titles tentatively scheduled for later in the year.  Short of an infusion of new funding, or the adoption of a new production model, this should bring the active work of the TCP to at least an interim conclusion.

Currently, EEBO-TCP Phase II texts are available only (with minor exceptions)  to authorized users at partner libraries.  By contractual stipulation, ProQuest currently has the exclusive right to distribute the EEBO-TCP Phase II corpus to new customers, or to users outside the original Phase 2 partnership. When this window expires, the texts will be released freely to the public. By long arrangement, this removal of restrictions will occur on or about January 1, 2021.

How we selected texts

Selection of works to transcribe for EEBO Phase 1 was initially based on named authors mentioned in the New Cambridge Bibliography of English Literature.  Though this tended to bias selection a bit toward canonical, or at least attributed, works, anonymous works may also have been selected at this stage if their titles appeared in the bibliography. The New Cambridge Bibliography of English Literature was chosen as a guideline because it included foundational works as well as less canonical titles related to a wide variety of fields, not just literary studies. In any case, this initial reliance on the New Cambridge soon gave way to a series of deliberate attempts to cast a wider net, for example by selecting works exemplifying a particular theme (food, drugs, piracy, witchcraft), or fitting a particular format (broadsides, pamphlets, etc.)  The intention was to supplement methodical selection with more or less random selection based on arbitrary criteria in order to expand the generic diversity of the corpus. Requests for particular works by faculty at partner institutions were also taken into consideration and, if feasible, placed at the head of the queue. A user willing and able to make a case for a given work almost always prevailed over other considerations.

Aside from method, selection always followed a set of general principles. Where possible, we prioritized selection of first editions, adding later editions, especially authorial editions, only when they were known to represent significant change or expansion over the first. We avoided texts that offered very limited textual content (heavily mathematical, musical, or numerical works, such as almanacs). We avoided works for which ProQuest had not yet provided a catalogue record, because we preferred to reserve our energies for keying and editing rather than original cataloguing. We avoided works predominantly in a non-Latin alphabet (in our period, mostly Greek and Hebrew). We also avoided works that were so poorly printed or preserved as to be illegible or barely illegible. “Stock” works frequently republished with slight differences (prayer books, Psalters, Bibles) were represented in our selection only by a few salient editions. Because our funding was limited, we aimed to key as many different works–as much different text–as possible.

It is worth pointing out, since the question arose during our discussions of selection policy, that a given work was not passed over for encoding simply because it was available in another electronic collection. 

Language was something of a vexed issue throughout. Our original intent was to be inclusive — indifferent to language — since so many authors of the period, and so much literature of the period, crossed linguistic boundaries. We set up separate editing and proof-reading operations for Latin (at Toronto) and Welsh (at the National Library of Wales in Aberystwyth), of which the latter at least achieved its goal of producing a finished transcript of every Welsh-language work in EEBO. And by special arrangement with the National Library of Scotland, we selected and keyed every extant work in Scots Gaelic (plus a few in Irish). But with regard to major languages (English and Latin) when it came time to planning for Phase 2, we concluded that it was better to aim at comprehensive English-language coverage than half-hearted English and Latin coverage. Realizing also that the supply of expert Latin keyers and proof-readers was limited, we chose to confine selection for Phase 2 almost entirely to predominantly English-language works.

Finally, again through considerations of efficiency, since the body of newsbooks and periodical literature of the 17th century formed a compact and manageable corpus, we chose to exclude serials almost entirely from our efforts, reasoning that we — or another later project — could take them on in due course as a worthwhile project in its own right.

Completing the project

The TCP, whether regarded as an innovative and collaborative business model, a corpus of texts, a set of methods and standards, or an organization, has proven itself by producing a marvelous historical resource.  Production is winding down; the last few thousand texts should be released in 2019 to the partnership and next year to the world at large. But because there is work still to do, and proven means to do it, the TCP prefers to regard itself and its projects as still relevant, if only as a model, and better described as ‘in hiatus’ than, strictly speaking, “done.”