OVERVIEW of EEBO processing at Michigan and Oxford. (Toronto process is similar)

  1. Reviewers will each month receive a batch of files delivered into a date-stamped directory in the MARKUP\EEBO\WORK\TODO folder (Michigan) or the TEXTS\INBOX folder (Oxford) on their shared drive (Michigan: "F:" or "G:"; Oxford: "P:"). The name of the directory will normally correspond to the previous month (since that is the month when the files actually arived from the vendors).

    With each batch of files will come a file list received as a text file but converted into and stored as a spreadsheet, named after the same month, e.g. filelist.200408.xls, and containing the file name (based on the STC number and keying vendor), the UM ID number (beginning with "A"), and the number of bytes in the file as received.

    S3456.tech.sgm	A12345     14,234
    Wd456.pdcc.sgm	A23567  2,345,598
    
    etc.

  2. This file will be placed and kept in the same directory as the incoming files (Michigan) or in the METADATA folder (Oxford).

  3. A reviewer will normally work on one text at a time, though some people may find it more efficient to work on up to five files at a time.

  4. The basic process consists of two steps: proofing and review. Only when a text is accepted on the basis of character accuracy will the reviewer take the time to review and correct the tagging.

  5. To claim a text to work on, the reviewer will move a file from the inbox to a working directory named after the reviewer (john, mona, olivia, emma, etc.) in /Markup/EEBO/Work (Michigan) or /TEXTS (Oxford). This directory should be available to the other reviewers and supervisors, since it indicates which files are in progress at any given time. But no one should touch files in an in-progress directory except the reviewer who owns it.

  6. A notes file accompanies each file from the moment work begins on it. This file preserves the basic facts about what was done to the file, and also serves to communicate queries and problems either up the ladder at TCP (i.e. to Paul), or further on to the conversion firms themselves.

  7. Whilst a file is being worked on, all associated files (e.g. the image file, notes file, test file (sgm), test file (html), stripped file, and perhaps also temporary copies of the main file, will reside in the working directory. The reviewer is encouraged to create working copies of the file on the local hard drive (C:) in order to speed processing but to save the results daily to the in-progress directory on the shared drive. Each reviewer is responsible for the files claimed for review, and should work out a reasonable way of creating backups, and of saving copies of completed work, in case the working copy is corrupted or lost.

  8. Proofing is done on a sample of the text (roughly 5% or 5 pages, whichever is more), by comparing a printout of the sample text (printed from an HTML version) to page-images on screen (viewed in the form of PDFs).

  9. Once a text has passed the proofing stage, it is 'reviewed' and corrected, especially with regard to tagging, but sometimes also with respect to character-level transcription. It is at this point that the .sgm file itself is edited, either in a text editor (TextPad, etc.), or in a dedicated SGML editor (XMetaL, etc.), or both.

  10. Once a file is done (either accepted and corrected or rejected), the main .sgm file will be moved either to DONE or REJECTED, and the associated notes file will be moved to NOTES. All other associated files will be deleted (except local and personal backups, as above).

  11. Proofing is done from paper printouts of a sample of the text, with a coversheet. This paper copy should be preserved for at least a month after completion (maybe two, or even more); the coversheet is simply a paper copy of the 'notes' file associated with the text.

  12. When a file has been disposed of (either completed or rejected), additional information should be entered in the monthly file list spreadsheet: verdict on the file (accept, reject, pardon); name of reviewer; date (either rejected date or "done" date as appropriate) in yyyy-mm-dd form; and a few vital numbers: sample size in bytes (the 'stripped sample size'); number of excusable errors; number of inexcusable errors; number of 'unwarranted illegibles' ('bad $s'); and number of $-groups. All of these numbers come from the notes file.

For detailed instructions on the various stages, see the following companion documents: