OVERVIEW of EEBO processing at Michigan and Oxford. (Toronto process is similar)
- Reviewers will each month receive a batch of files delivered
into a date-stamped directory in the MARKUP\EEBO\WORK\TODO folder
(Michigan) or the TEXTS\INBOX folder (Oxford) on their shared drive
(Michigan: "F:" or "G:"; Oxford: "P:"). The name of the directory
will normally correspond to the previous month (since that is
the month when the files actually arived from the vendors).
With each batch of files will come a file list received
as a text file but converted into and stored
as a spreadsheet, named after the same month,
e.g. filelist.200408.xls, and containing the file name (based on
the STC number and keying vendor), the UM ID number (beginning with "A"),
and the number of bytes in the file as received.
S3456.tech.sgm A12345 14,234
Wd456.pdcc.sgm A23567 2,345,598
- This file will be placed and kept in the same directory
as the incoming files (Michigan) or in the METADATA folder
- A reviewer will normally work on one text at a time, though
some people may find it more efficient to work on up to five files at
- The basic process consists of two steps: proofing and review.
Only when a text is accepted on the basis of character accuracy will
the reviewer take the time to review and correct the tagging.
- To claim a text to work on, the reviewer will move a
file from the inbox to a working directory named after the
reviewer (john, mona, olivia, emma, etc.) in /Markup/EEBO/Work (Michigan)
or /TEXTS (Oxford). This directory should be available to the other
reviewers and supervisors, since it indicates which files are
in progress at any given time. But no one should touch
files in an in-progress directory except the reviewer who owns it.
- A notes file accompanies each file from the moment work
begins on it. This file preserves the basic facts about
what was done to the file, and also serves to communicate
queries and problems either up the ladder at TCP (i.e. to Paul),
or further on to the conversion firms themselves.
- Whilst a file is being worked on, all associated files
(e.g. the image file, notes file, test file (sgm),
test file (html), stripped file, and perhaps also
temporary copies of the main file, will reside in
the working directory. The reviewer is encouraged
to create working copies of the file on the local
hard drive (C:) in order to speed processing but to
save the results daily to the in-progress directory on
the shared drive. Each reviewer is responsible for the
files claimed for review, and should work out a reasonable
way of creating backups, and of saving copies of
completed work, in case the working copy is corrupted
- Proofing is done on a sample of the text (roughly 5%
or 5 pages, whichever is more), by comparing a printout
of the sample text (printed from an HTML version) to
page-images on screen (viewed in the form of PDFs).
- Once a text has passed the proofing stage, it is 'reviewed'
and corrected, especially with regard to tagging, but sometimes
also with respect to character-level transcription. It is
at this point that the .sgm file itself is edited, either
in a text editor (TextPad, etc.), or in a dedicated SGML
editor (XMetaL, etc.), or both.
- Once a file is done (either accepted and corrected or
rejected), the main .sgm file will be moved
either to DONE or REJECTED, and the associated notes
file will be moved to NOTES. All other associated
files will be deleted (except local and personal
backups, as above).
- Proofing is done from paper printouts of a sample of the
text, with a coversheet. This paper copy should be
preserved for at least a month after completion (maybe
two, or even more); the coversheet is simply a paper copy
of the 'notes' file associated with the text.
- When a file has been disposed of (either completed or
rejected), additional information should be entered in
the monthly file list spreadsheet: verdict on the file (accept,
reject, pardon); name of reviewer; date (either
rejected date or "done" date as appropriate) in
yyyy-mm-dd form; and a few vital numbers: sample size in bytes
(the 'stripped sample size'); number of excusable errors;
number of inexcusable errors; number of 'unwarranted
illegibles' ('bad $s'); and number of $-groups. All of these
numbers come from the notes file.
For detailed instructions on the various stages,
see the following companion documents: