This directory contains files used to document the creation of SGML/XML-encoded text under the auspices of the Text Creation Partnership (EEBO-TCP, Evans-TCP, ECCO-TCP), at the University of Michigan Library. These are working files not intended for public distribution.
VENDOR DOCUMENTATION
- Keying/encoding instructions, version 3 (partial revision 2004)
- Detailed guidelines for capturing the textual information in the EEBO items. (Version 1 and Version 2 are still available).
- Sample pages
- Index to 25+ sample pages from potential EEBO items, each presented as a page image or pair of page images (in .pdf) and a corresponding transcription (in SGML).
- Calculating EEBO error rates
- Documentation of sampling procedures and error-rate calculations
- Examples of errors
- Examples of “excusable” and “inexcusable” character-level transcription errors
- “Illegible” ($) overused (1)
- Examples of text unnecessarily marked as illegible
- “Illegible” ($) overused (2)
- More examples of text unnecessarily marked as illegible
- “Illegible” ($) overused (3) and (4)
- Yet more examples of text unnecessarily marked as illegible
- The other extreme: guessing
- Examples of text captured without sufficient warrant in the damaged original
- More of the same
- More examples of creative capture
- Roman numerals
- Two special problems with roman numerals: overlining and backwards-c
- TEI guidelines
- TEI P3 documentation, including element-by-element descriptions
- EEBO tagging “cheat sheet”
- Supplies a summary description of each of elements of the EEBO tag set (prepared for purposes of internal training)
- DIV TYPEs
- List of common and preferred values for the TYPE attribute
- Decorated initials
- Page of sample decorated initials (and non-decorated large initials for comparison)
- Apothecaries’ symbols
- Capture of apothecaries’ symbols (ounce, dram, scruple, etc.) as found in medical recipes.
- Alchemical symbols
- Some samples of alchemical symbols, with suggestions for capture (draft)
- Unusual symbols used as note markers
- Suggested capture for notes that use unusual symbols as markers.
- Noting subtle font changes
- Examples of subtle typeface changes to be marked with <HI> or <Q> (etc.).
- Inverted letters
- Examples of letters accidentally printed upside-down.
- Sample alphabets from the Caxton’s press: ; His ‘type 1’ font ; His ‘type 2’ font ; His ‘type 3’ font ; His ‘type 4’ font ; His ‘type 5’ font ; His ‘type 6’ font
- Alphabets and letter combinations extracted from Caxton’s type fonts, with tentative instructions on capture (to be revised as various letter combinations are seen in context in the books themselves).
- Additional symbols
- A supplement to the main keying instructions.
- Character capture issues (March 2005)
- Five proposed areas of change and innovation in character capture:
- When there’s nothing there…
- Quick summary of the treatment of blanks and things missing.
INTERNAL (REVIEWERS’) DOCUMENTATION
- [in progress] All characters list
- Experimental list of all available character entities, with pictures. [will never be as up to date as the auto-generated charent list on which it is based.]
- All character entities
- List of all available charents (both TCP-created and ISO sets) with displayable forms as used in derivative XML version of texts [auto-generated from character map file (see below)].
- Additional symbols/charents
- Growing list of symbols for reviewers to recognize and supply beyond those in vendor instructions
- More odd uses of symbols and characters
- Especially math
- Overview of review process
- Basic guide to the inhouse review process as a whole
- How to proof
- Step-by-step guide to the proofing stage (preparing and proofing sample)
- How to review
- Step-by-step guide to the tag-review stage (reviewing and correcting book)
- How to end
- Step-by-step guide to the final stage (checking in and reporting)
- More Latin abbrevs
- Further examples of Latin abbreviations, etc.
- Ambiguous abbreviations
- Examples and draft policy on ambiguous characters and symbols, with examples. (also includes more examples of apothecary’s measures)
- Anglo-Saxon type
- [now moved to vendor area]
- Greek type and ligatures</a
- Early modern Greek type and its characteristic forms and ligatures: introduction and a few unorganized samples
- Hijacked symbols
- Some thoughts on symbols pressed into duty against their will.
Reviewers’ questions and tips relating to …
- Structure
- Using DIVs to group like things; Using GROUP instead of BODY for several texts with common title front and/or back matter; DIVS and LETTER tags; Songs embedded in plays; Using Q for “raisins in oatmeal”; OPENERs and CLOSERs as holdalls; Dialogues and Catechisms: Questioner and Responder. When pages are in the wrong order.
- Notes and Milestones
- Note markers; Note placement; Handling endnotes.; STAGE and NOTE combined; Use of MILESTONE unit attribute; MILESTONEs with illegible values; Multiple notes with a single reference
- Captions, Headings, and Quotations
- Captions in figures; ARGUMENTS in verse; Quotations on title pages; Authorial interjections in quotations; Changing &startq; into <Q> and <HI>; <Q>s broken by <P>s. Q+BIBL inside HEAD. Q+BIBL inside TRAILER. Using running header for division header. Placement of epigraphs.
- Letters
- New tag: POSTSCRIPT; DIV versus LETTER; SALUTE and SIGNED; Use of DATELINE and DATE (DATELINE and SIGNED, DATELINE without DATE, Including dating system within DATE); Sample CLOSERs with problems; Correct sample CLOSERs and SIGNEDs; Lists of signatories.
- Matters philosophical
- Correcting illegibilities; Counting in/excusable errors; Purpose of DIV types; Printer’s errors.
- Matters miscellaneous
- Superscripts, including superscript o; Clarifying UNCLEAR; Long or short lines in verse; Abbreviations and abbreviation entities; Tagging “Explicit”s; editing TABLEs; Acrostic poem; “Spoken by…” in plays; letters for rubricator.
- Title Page matters
- Proofing the title page; Handling epigraphs on title pages; Imprimaturs, approbations, licenses
- Software tips (esp. TextPad)
- TextPad clip libraries; TextPad upgrades; TextPad syntax file (for color-coding tags); downloading EEBO pdfs.
- Divisions (DIVs)
- Assigning div types; Sample div types; Use of “N” attribute alongside “TYPE”
- Lists
- Lists with curly braces; Genealogies as lists; Tables of Contents and Indexes as lists; Changes to the model of LIST; Syllogisms as lists
- Character capture issues
- Z and yogh in Scottish texts; Other uses of z; I/J; Illegibilities
Code
- TCP dtd version 1.0 (SGML version) for use in 2001
- TCP dtd version 2.0 (SGML version) for use in 2002+
- TCP character entity selection (auto-generated from TCP character map file)
For internal use only
- TCP dtd for the use of TCP staff MURPs (SGML version) for use in 2002+
- TCP XML dtd for indexing and online delivery only (adds SUP and SUB tags not found in SGML version)
- TCP SGML dtd for character map file (below)
- TCP character map file (complete list of character entities mapped to various replacement strings and supplied with Unicode equivalents, either canonical or private-use, from which file-conversion hashes and the SGML character-entity files can be derived)
Vendors’ coding and capture queries (all very old)
- (No. A1) Re: Drama tags (<SP>, <SPEAKER>) in non-dramatic dialogs. Marginal notes and numbers in prose texts. Page-level illegibility (see now P12 instead).
- (No. A2) Re: Milestones.
- (No. A3) Re: Musical notation.
- (No. A6) Re: Single table, illustr., etc. spanning multiple pages.
- (No. P1) Re: Marginal notes and numbers in prose texts. Strange “q”-like character in Latin passage.
- (No. P2) Re: Odd characters: stars, pointing fingers, and dot-triplets.
- (No. P3) Re: Braces; <STAGE> directions; marginal notes IMPLICITLY linked to asterisks in the text.
- (No. P4) Re: Interlinear numbers in a “puzzle” poem; <SPEAKER> tags; <SPEAKER>s identified only by number.
- (No. P5) Re: ee and oo ligatures with acute accent marks
- (No. P7) Re: Numbers appearing usually (but not always) at beginnings of <P>s; specialized vs. default (fallback) tagging; blocks of text after FINIS.(<BACK> matter).
- (No. P8) Re: identifying <LETTER>s buried in running text.
- (No. P9) Re: missing t.p.; verse paragraphs; poetic letters; analytical summary table of contents; list vs. table; lapidary inscriptions; fractions; mismatched catchwords (missing pages?)
- (No. P10) Re: text attached to figures; acrostics printed at an angle.
- (No. P11) Re: in-line figures; overlining (of roman numerals).
- (No. P12) Re: damaged and illegible text; out-of-sequence pages
- (No. P13) Re: song lyrics interspersed with musical notation
- (No. P15) Re: duplicate pages: capture both or one & if the latter, which one?
- (No. P16) Re: right-justified words at ends of verse lines
- (No. P17) Re: multiple typefaces used concurrently, partly to mark quotations
- (No. T1) Re: miscellaneous tagging problems exemplified.
- Question log (1) regarding the bidding process
- Questions (with answers) received from data conversion firms, as well as updates and announcements.
- Question log (2) regarding setup and production
- Questions (with answers) received from data conversion firms, as well as updates and announcements.
Accumulated Wisdom garnered by the Oxford staff
N.B.: this section appeared originally on the web site of Oxford’s Bodleian Library, and represented (mostly) a compilation of email responses to particular issues in the capture and encoding of early modern books.
Encoding
- <ADD>
Mistaken use of ADD tags in LETTERs;
Use of ADD tags for typewritten material;
ADD with handwritten material;
ADD in CLOSERs;
- <CLOSER>
Addressee in CLOSER;
Notary signatures;
Use of POSTSCRIPT;
Problem CLOSERs;
Signatures tagged as SALUTE;
DATELINE tagged as SIGNED;
Correctly tagged CLOSER with DATELINE;
CLOSER or TRAILER?;
CLOSER, SIGNED, for “quoth x” ;
Unusual CLOSERs in LETTERs;
DATELINE or SIGNED for descriptions of person or place?;
“directions” after CLOSER in letters;
- DIV types
EEBO DIV TYPEs; ECCO DIV TYPEs;
DIV TYPE=”list of authors”;
DIV type for lists of Scripture references;
DIV TYPE=”envoy” (1);
DIV TYPE=”register”;
DIV type for quoted remarks;
DIV type for kings;
DIV TYPE=”corroboration”;
Q at the end of DIV; envoy (2);
DIV TYPE=”imprimatur”;
DIV TYPE=”title”;
DIV TYPE=”publisher’s advertisement”;
DIV TYPE=”testimonial”;
DIV TYPE=”approbation”;
DIV TYPE=”envoy” (3);
DIV TYPE=”docket title”;
DIV TYPE=”attestation”;
DIV TYPE=”advertisement” etc;
DIV TYPE=”mittimus” etc;
DIV TYPE=”versions”;
DIV TYPE=”index” vs DIV TYPE=”table of contents”;
DIV TYPE or N?
- Drama
Placement of STAGE directions;
Use of STAGE for heads of speeches;
Use of drama tags in other types of material;
Use of STAGE for heads of speeches;
“spoken by” as STAGE or OPENER;
“spoken by” in STAGE not BYLINE;
STAGE used in non-drama;
STAGE used in musical texts;
“enter chorus” as STAGE and SPEAKER;
use of Q within dialogues and plays;
STAGE in musical texts?;
- <FIGURE>
Captions functioning as HEADs;
Uncaptured material in captions;
FIGURE before HEAD;
Tags allowed in FIGURE;
Two related thoughts on FIGUREs;
Placing FIGURE inside HEAD.;
Multiple PBs in fold-out maps;
FIGURE appearing at the end of text;
FIGURE before HEAD?;
Porphyry’s tree diagram;
TRAILER type=”illustration”;
Use of FIGURE for title page borders?;
Scheme attribute of FIGDESC;
Capturing printers’ devices;
Guide letters in rubrication;
Sample tagging of frontispiece with FIGURE, FIGDESC, and BYLINE;
- <GAP>
Correcting illegibles;
“Intruder” GAPs;
Missing material of less than a page;
Another intruder GAP;
“Blank” GAPs;
Mdash or blank gap?;
Material omitted in print;
- <LETTER>
Marginal quotes;
Letters broken by comments;
Complex openings in LETTERs;
Superscription in LETTER;
- <LG>
Acrostic poems;
Songs in drama;
LGs with indented text;
Obsolete type attributes for LGs;
Nested LGs for songs in drama;
Short or Long Verse Lines?;
Verse numbers in metrical psalms;
Lines entirely in HI;
Short or long lines? (3);
- <LIST>
TRAILERS in LISTs;
Complex nested LISTs;
Simplifying an index;
Simplifying LISTs with nesting;
Simplifying LISTs with LABEL and ITEM;
More on LABEL and ITEM;
LISTs with just one ITEM;
Syllogisms;
ROLE=”label” attribute for LABEL and ITEM;
LISTs with curly braces;
Syllables alongside syllogisms;
Nested LISTs used for genealogies;
Options for table of contents tagging;
LIST type=”sum”;
Mixture of LISTs and verse;
- <NOTE>
Endnotes;
Multiple NOTE sets;
Interlinear NOTEs;
Endnotes and markers in the text;
Index fingers tagged as MILESTONE;
UNIT values in MILESTONEs;
LETTERs broken by comments;
Linking Endnotes;
Difference between REF and PTR;
Automatically moving NOTEs next to markers;
Endnote or marginal note?;
Correct use of endnotes;
Strung out marginal notes;
- <OPENER>
Tagging “to the tune of”;
OPENER and CLOSER as holdalls;
DATELINE without DATE;
Complex OPENERs in songs;
“To the honour of God” as OPENER;
Complex opening to dedication;
Where to split HEADs and OPENERs;
DIV TYPE=”ballad”; more on HEADs and OPENERs in songs;
- <Q>
Q versus HI REND=”marginal quotes”;
startq; and endq; to mark dialogue;
when to use HI REND=”marginal quotes”;
one-off dilemma;
Q and BIBL within HEAD;
Two Ps within one Q, or two Qs?;
authorial interjections in quotations”;
Use of Q and BIBL in HEAD (2);
Q or TRAILER at the end of chapters;
Use BIBL where there are no Q tags?;
Thus far in BIBL;
- <TABLE>
Parallel text or TABLE?;
Use of TABLEs for paired quotations;
TABLE tagging: rowspans;
Point-by-point structure in parallel text;
Transcription
- Abbreviations and Ligatures
Esq;;
Ambiguious ae/oe forms;
German eszett/ss/tz;
Barred q symbol meaning “qui” or “quam”?;
More on barred q;
E hook, eogon, and ae ligature;
G with an abbreviation stroke;
Caxton’s barred double-l;
Abbreviations for Christ;
Removing ABBR round barred double-l;
Abbreviation for “qui”;
Caxton’s “d-flourish”;
Abbreviation for “quartern(e)”;
- Fonts
Anglo-Saxon mistranscribed;
Dealing with Gaelic/Irish fonts;
Anglo-Saxon P and wyn;
z/yogh; p/thorn;
Scottish yogh and sz;
- Foreign alphabets
Greek symbol made up of o and u;
Standardization of Hebrew character entities;
Breathings in Greek;
Problems with capturing Greek;
Breathings in Greek (2);
- Miscellaneous
Where to find the latest charent file;
Blocks of upside down characters;
Upper-case Z in the middle of words;
Whitespace capture;
- Punctuation
Half slashes captured as commas;
Use of &punc; within a word;
Full stops in the shape of crosses;
Space between elided article and noun in French;
Mdash or blank gap?;
ct ligature for ampersands;
End-of-line hyphens
- Symbols
- Handwritten Symbols;
Jupiter v. Recipe;
Date symbols;
Stand-alone symbols;
X as denarius;
Infinity symbol;
Squares and Quadrines;
Use of generic and specific cross entity references;
Rotated index fingers;
Reversed section symbol;
Rx symbol;
Fractions;
Note-markers;
Technical
Miscellaneous
- Matters philosophical
- Correcting illegibilities; Counting in/excusable errors; Purpose of DIV types; Printer’s errors.
- Matters miscellaneous
- Superscripts, including superscript o; Clarifying UNCLEAR; Long or short lines in verse; Abbreviations and abbreviation entities; Tagging “Explicit”s; Editing TABLEs; Acrostic poem; “Spoken by…” in plays; Letters for rubricator.