How to Review EEBO Materials

Note: Some of this may be more easily done using a parsing editor with display options (e.g. XMetaL or EPCedit), but these instructions presuppose that you are using the plain-text editor TextPad.

  1. ACQUAINT YOURSELF with the book. Proof-reading the book should have already given you some familiarity with it. Page through the book more widely now, looking for signs of overall structure and organization; for anomalies and exceptions of all sorts, especially those that violate the overall organization or are likely to have confused the keyers; and for potentially problematic formatting, e.g. math, music, tables, lists, and complex marginalia.

  2. OPEN THE SGM FILE in TextPad. Double-clicking should be enough.

  3. PASTE IN A TEMPHEAD in place of the existing DOCTYPE declaration, in so doing replacing the existing doctype declaration with the one attached to the temphead (this replaces the keyers' rule set with ours). You may use a minimal temphead if you like, such as this:

    <CHANGE><DATE>yyyy-mm-dd</DATE><RESPSTMT><NAME>[name of reviewer]</NAME><RESP>MURP</RESP></RESPSTMT><ITEM>Proofed text and corrected markup.</ITEM></CHANGE>

    but many people use a more complicated one that contains a checklist of common problems to check, such as this:

    <CHANGE><DATE>yyyy-mm-dd</DATE><RESPSTMT><NAME>[name of reviewer]</NAME><RESP>MURP</RESP></RESPSTMT><ITEM>
    * Review overall document structure and hierarchy, including GROUP TEXT FRONT BODY BACK DIVs
    * Observe divtop and divbottom material marking beginnings and ends of divisions, check that they are correctly tagged as HEAD, HEADNOTE, CLOSER, OPENER, ARGUMENT, EPIGRAPH, SIGNED, DATELINE, BYLINE, etc.
    * Add TYPEs to DIVs in order to validate, making sure that the @type structure makes sense as a navigational aide, and that each type makes sense wrt to the one above it.
    * Survey book for troublesome formats of all kinds, including lists, illustrations, tables, music, math, end-notes, quasi-tabular arrangments, and marginalia. Made sure that it was correctly tagged. Decide how much is worth changing.
    * Look for unobtrusive numerations and other occult signs of structure. Do they mark divs? milestones?
    * Examine marginalia. Is it all best tagged as NOTE?
    * Look for illustrations, make sure that usable text within them has been captured and correctly tagged.
    * Add FIGDESCs to any and all likely illustrations.
    * Check placement and completeness of PBs.
    * Check for GAPs and #s.
    * If appropriate, check for yoghs.
    * If appropriate, check for Latin problems, e.g. oe and ae digraphs
    * If appropriate, check for abbreviation and brevigraph problems.
    * If appropriate, check for units of measure, symbols (alchemical, astronomical, etc.) and the like.
    * Look for decorated initials. Have they been marked as such?
    * Survey the illegibles. decide whether resolving them is feasible or even possible. Use the 100 rule only as a rule of thumb. Resolve those you can.
    * Correct the errors found during proofing.
    * Proof title page(s).
    * Run 'skint'
    * Run 'check'
    * Run "v'

    The existing templates include a good deal of boilerplate. Pieces of it that do not apply can be deleted. This area is used to record anything distinctive done to the text, or anything left undone, e.g. "blackletter text should have been tagged as HI throughout, but wasn't." Feel free to edit the templates if you find that they do not accurately reflect the most common tasks that you find yourself performing in the books. Many reviewers use the template as a quick checklist of things to do and look for. If you do not use it this way, you may prefer to use a very shortened form of template.

  4. TITLE PAGES. Proof title page(s): remove excess P tags (or, more rarely, insert additional P tags) and HI tags that record typeface changes used only for decorative effect. Generally, Ps should demarcate title/author and publisher/date. Title pages with lengthy sub-titles, epigraphs, etc. will usually require more Ps. Note figures or quotations on the title page, tag them as <FIGURE> or <Q>.

  5. STRUCTURE. Page through the original book in order to get a sense of its structure. Pay particular attention to the usual cues: table(s) of contents and summaries, if any; "heads" and "feet" (and marginal text that serves the same purpose) that indicate the beginning and end of something; and numerical or sequential clues ("Firstly, Word"; "Secondly, Sacrament").

    Compare the structure applied by the vendor and correct to match the book. Detailed multilevel structural hierarchies can often be left unmarked if it proves too much trouble to capture them, but this decision should be made only after you determine to your satisfaction what the real structure is and how much would be sacrificed.

    Typical vendor problems:
    • missing the lower levels in a multi-level hierarchy
    • treating whole poems as if they were merely stanzas (LGs)
    • missing signs of subordination (i.e., putting two sections at the same level instead of making one subordinate to the other)
    • using or abusing DIVs when what is really needed is <Q><TEXT><BODY><DIV1> ... </DIV1></BODY></TEXT></Q> as if the book had a 'grapefruit' structure instead of a 'raisins-in-oatmeal' structure.

  6. Note HEADS and FEET. Divisions usually have something at their head or foot, or both, to mark them off. We have various tags for these things (TRAILER, HEAD, ARGUMENT, EPIGRAPH, SIGNED, OPENER, CLOSER, etc.). It is usually convenient to check these at the same time that you are considering the structure.

  7. VALIDATE. throughout the process. Most easily done by choosing "v" from the tools menu in Textpad. Invalid files should never be regarded as done. (The file is valid when "v" yields no error messages.) Add TYPEs to DIVs using your knowledge of the overall structure as well as any local designations in the text.

    In textpad, using Find in Files to search for <DIV[^>]*> (with binary, all matching lines, and regular expression checked) will provide a list of DIVs with TYPEs.

    Lack of TYPEs should be primary (often the only) reason that a file fails to validate. Pursue invalid bits one by one till the file validates.

  8. RESOLVE FORMAT ISSUES. Survey book for troublesome formats of all kinds, including lists, illustrations, tables, music, math, end-notes, quasi-tabular arrangments, and marginalia. Made sure that it was correctly tagged. Decide how much is worth changing.
    1. LOOK FOR UNOBTRUSIVE numerations and other occult signs of structure. Do they mark divs? milestones?
    2. EXAMINE MARGINALIA. Is it all best tagged as NOTE?
    3. ILLUSTRATIONS. Look for illustrations, make sure that usable text within them has been captured and correctly tagged, including HEAD BYLINE Q L P. The vendors are often apt to omit captions or caption-like text if it looks at all unusual. (E.g., if it surrounds the picture rather than sitting under it.). Add these missing captions if you can read them: they provide important information about the illustrations. Prefer more text rather than less in association with figures; add it if you can. Add FIGDESCs to any and all likely illustrations.
    4. BLOCK QUOTATIONS Spot-check for Qs. Q is often used wrongly, e.g. for small headings.

  9. PAGE BREAKS. Check placement of PBs: move PBs inside DIVs (optionally also inside Ps, Qs, LGs, (but not inside Ls, ROWs, ITEMs, HIs), if the beginning of the content-bearing element is coincident with the beginning of the page. That is, if the page begins with something that is tagged, tuck the PB inside the beginning of it; if it begins with two tags (say, DIV1 and HEAD) put the PB only inside the first. If a blank page precedes, put in BOTH PBs, one right after the other.

    Check completeness of PBs. (In Find in Files, find <PB[^>]*> with Regular expression and Binary files checked. The resulting list should show a PB for every page in the file, including blank pages at beginning and end.

    If the image set includes two (or more!) copies of the same page, choose one to capture and omit the other(s); mark the uncaptured page <GAP DESC="duplicate" EXTENT="1 page"> and a <PB> tag.

    Note: Sometimes the last image in the set includes the FIRST page of a book that was bound with the one that you are working on. Omit this material and treat the page as a blank flyleaf. Similarly, the first image in a set will sometimes include the last page of some other book. Treat this also as a blank flyleaf. Include a PB tag but no text.

  10. REVIEW PROOFSHEETS. Check proofsheets and correct. The proofsheets will usually have a few character-errors, excusable or not, that need correcting, along with spacing problems. Correct these. Use the proofsheets as indicators of possible other "global" problems: if a U is captured as a V on a proofsheet, it is worth checking others in the file; if numbered Ps are captured as ITEMs on a proofsheet, it is worth checking ITEMs throughout. If a note is seriously misplaced (or notes are inconsistently placed) on the proofsheet, the same is probably true throughout.


    A brief sample will show whether the MUSIC, MATH, and FOREIGN gaps are correctly used. Check spacing around <GAP DESC="foreign">. Early files often need spaces added on each side.

    Illegibilities are harder. You may find individual letters marked as $, groups of letters marked as strings of $s (e.g. Lo$$on for "London") illegible words marked as $word$, and pages, lines, and spans of text marked as $page$, $line$, and $span$.

    Tne notes file should already contain a count of illegibilities of the most common types. Searching for (regular expression, binary, file count only) \$[^ ]*\$? should confirm the overall count, which is the most important one: if there are fewer than 100 $-groups in the file, you should at least consider resolving them individually (either by supplying the correct reading or by deciding that the text really is illegible). Sometimes it is clear that the illegibles arise from some insuperable problem (such as a tight binding or cropped pages); in that case it is not usually worth while trying to resolve them individually. And if the book contains more than 100 illegibles, it is again not usually worth while trying to resolve them individually.

    Once you have resolved the illegibles that you intend to, run the batch file "skint" (in TextPad, from the tools menu). This file edits the sgm file 'in place' by replacing the $s with proper tags <GAP DESC="illegible">--and saves the unmodified version in the same directory with the extension .bak.

    If you're resolving illegibilities individually, you'll find that many can be read (given contextual information) with at least 95% certainty. Feel free to insert the correct character in such cases based on context, so long as the physical form remaining does not contradict your conclusions as to the the correct character. Do not attempt to supply a character when there is nothing in the original at all, no matter how correct or inevitable it might be. Those that cannot be resolved should be replaced by <GAP DESC="illegible" EXTENT="1"> (or whatever extent applies), normally by running "skint."

    Other problems with illegibility may require creative solutions, and they are too various to be listed here.

  13. VENDOR COMMENTS. Ensure that any problems noted in comments at the head of the file (occasionally supplied by vendors) have been resolved.

  14. PECULIARITIES Ensure that all notable and peculiar features of the text (as observed while doing routine proofing and review) have been adequately captured. Refer to the guidelines and to the online tips, emails, etc., or ask advice from colleagues and supervisors. Common problematic features include tables, lists, indexes, figures, extended quotations, dialogues, dramatic features, stanzaic and verse structure (long lines or short? carry-over lines?), epigraphs, headings, and arguments, figures within figures, genealogies, etc.

  15. RUN CHECK from the TextPad tools menu. This looks for symptoms that may indicate tagging or capture problems. Like most diagnostic tests, "check" often yields false positives, but its results should only be ignored if you are sure that they should be.