Matters Philosophical

Correcting illegibilities
Counting in/excusable errors
Purpose of DIV types
Printer's errors

1. Correcting Illegibilities

Question: When correcting illegibilities, do we err on the side of representing the image, or helping people search?

There are contending schools of thought, but the consensus is:

We favor the searcher, at least to this extent: if I am sure what the letter should be (95% sure); if there is something there; and if what is there does not conflict with my interpretation -- then I'm willing to put in the "right" letter. If I am really not quite sure; or if there is effectively nothing there at all -- then I leave it as illegible.

2. Counting in/excusable errors

(a) The five classes of errors listed in the notes
files--namely:

    (1) inexcusable character-errors (mistranscriptions)
    (2) excusable character-errors (mistranscriptions)
    (3) COMPLETELY unwarranted use of $ or $word$ etc.
    (4) missing letters supplied
    (5) spacing problems

    --are mutually exclusive. That is, if you count an error as
    belonging under (3), you shouldn't count the same error
    under (1).

(b) the error count to be used in deciding whether to accept
    a book is the sum of (1) and (3). You can consider the
    others as indicators of the general quality of the text,
    but we don't count them against the 1:20k spec. [In books
    received before 8/01, count only (1).]

(c) in considering "$"s for potential inclusion under (3)
    we are biased in favor of the vendor in several respects:
    (1) we assume the burden of proof and award doubtful
    cases to the vendor; (2) we limit ourselves to visual
    evidence only (excluding our 'reading' knowledge); and
    (3) we honor the vendor's own criteria: e.g., in
    the case of Apex, if there are more than two contiguous
    letters doubtful enough to be called "$", we allow them
    to call the whole word $word$, annoying though this is.

3. Purpose of DIV types

If, as commonly, you find that a vendor has inserted a DIV1 tag that is coterminous
with the BODY--i.e., a DIV1 that simply contains all the body of the
book, rather than only part of it--be aware that:

(1) Such a DIV1 is superfluous. It may nearly always be safely
removed except when doing so would leave the BODY without
any DIVs at all.

(2) The only two places where I can think that we commonly
    deliberately insert such a DIV1 occur (1) within
    the <Q><TEXT><BODY> construction, when we frequently use a
    <DIV1> of a particular TYPE in order to specify the nature
    of a quoted text (e.g. <Q><TEXT><BODY><DIV1 TYPE="document">);
    and (2) when the BODY would otherwise have no DIVs at all.

(3) Such a DIV1 is basically harmless. If removing it requires that
we change the level of many subordinate DIVs, we usually
leave it in place.

(4) Such a DIV1 usually receives a TYPE attribute of "text," rather
    than (say) "book" or "body", just by tradition; in retrospect,
   a TYPE="body" would probably make more sense and be more consistent
    with our other practice.

(5) We usually do *not* use the TYPE attribute of such a DIV1 as
    a backdoor way to indicate the genre of the book as a whole.
    Such generic classification is better left to the information
    in the bibliographic record, and therefore in the header,
    where it is under cataloguers' control, than to us. That is,
    we do *not* say <BODY><DIV1 TYPE="treatise">...</DIV1></BODY>
    or <BODY><DIV1 TYPE="almanac">...</DIV1></BODY>

(6) There is no, however, no harm done by using the TYPE
    atribute in this way. It is just that it takes us into
    a kind of subject cataloguing that is not our job and can
    easily become quite challenging. In a few well-defined genres,
    in which we may someday wish to create restricted searches, it may
    even provide some benefit to retrieval (e.g.: type="play"
    and type="letter"), but not enough to make up for the cost it
    would impose if we chose to carry it through consistently.

So feel free to say DIV1 TYPE="play" or DIV1 TYPE="letter" (and maybe a few more common genres that occur to you), even when the whole book consists of the play or letter in question--but if the genre does not fall into one of these obvious categories, just use TYPE="text" rather than searching for an appropriate genre.

(7) The most important function of the TYPE attributes on DIVs
    in general is to clarify the relationship of parts of
    the book to each other and to the book as a whole. I
    realize that this too can involve some generic labelling,
    but this is only a kind of byproduct of the primary purpose
    of explicating relationships.

4. Printer's errors

Question: I have just come across some very clear type which says "fonne of God." Do I record it as "fonne" or "sonne"?

We would go with "fonne", although the printer would not thank us for perpetuating his mistake. The only situations in which we normally correct the printed copy have to do with attribute values: if the sequence of chapters is clear, but one is misnumbered (e.g.: LI, LII, LIII, LIV, LV, LIV, LVII), we set the value of the "N" attribute to the underlying, rather than to the printed, number (N="54" N="55" N="56"), while leaving the literal text as it is. We occasionally deal with mismatched footnote numbers the same way, by using the 'correct' number as the value of the N attribute and ignoring the incorrect one. Finally, there are times when a symbol of some kind is used with a meaning quite different from its usual meaning; in that case (if the symbol is based on a letter), we often preserve only the base letter within <ABBR> tags, rather than record a semantically misleading character entity. Thus "Esq;" is captured as <ABBR>Esq</ABBR> rather than as Es&abque;.