On this Page | Filed elsewhere |
---|---|
Source: email
Date: 01 Feb 2002
File name: n/a
Vid: n/a
Page ref: n/a
Keywords: illegibility, dollar groups
Query. When we are correcting illegibilities, do we err on the side of representing the image, or helping people search. Eg conti$gency can only be missing an n, but if the letter isn't printed at all, so we put it in or leave it out?
Answer. Even here there are contending schools of thought that wrestle more or less silently about this question. I think the consensus is as follows:
We favor the searcher, at least to this extent: if I am sure what the letter should be (95% sure); if there is *something* there; and if what is there does not conflict with my interpretation--then I'm willing to put in the "right" letter. If I am really not quite sure; or if there is effectively nothing there at all--then I leave it as illegible.
So much for letters. We are a little looser with respect to punctuation, especially since this often comes down to questions of little moment: "is that a smudge or a period?" "is that a smeared period or a comma?" In cases where it is not clear whether there is any punctuation at all, I prefer to omit all mention of it rather than put in a <GAP>. I.e., prefer "St John 4.17" to "St<GAP DESC="illegible" EXTENT="1"> John 4.17". And when it is a question of deciding between "," and "." I'd rather put in one or the other (based on likelihood or consistency) than GAP it.
Now that I've said this, I'll ask John if this is actually what we do.
Source: notes file
Date: 3 Sep 2003
File name: S1715
Keywords: GAP
Extraneous two-sided page (one side was a title page from another work; the other side was text that was not related to the surrounding pages) had been inserted between left page of image 49 and right page of img 50. I treated them as blank pages.
PFS: Apparently from S1734 or similar work. I added <GAP DESC="intruder"> tags.
Source: notes file
Date: Jan. 24 2005
File name: pdcc/S13953
Keywords: title page, intruder
It appears that an extra title page that does not belong was scanned at the beginning of this image file. it in fact belongs to STC 13243; the keyers have recorded it and we have left it in the sgm file because it is in the image at this time (but shouldn't be).
PFS: we've given ourselves three options when we find intrusions
from other books in the image set:
(1) if we can identify the intruding work, and it is substantial
enough to be worth recording, we simply split the file, and
name the new file, containing the intruder, after the intruder's
STC number (this gets a little complicated, because we
need to find or create a new ID string (<IDG>) for the new
file, give it a new ID number, etc., but we do all this
when necessary).
(2) if we cannot identify the intruder, and it is substantial or
occurs in the middle of the text somewhere, we omit it with
<GAP DESC="intruder">.
(3) if the intrusion is simply a title page, or last page (etc.)
tacked on at the beginning or end where one would expect a
blank flyleaf if anything, we treat the offending page as
if it were a blank flyleaf. I.e., omit the material, include
only the blank <PB> tag.
Your example is a little peculiar: it appears at the beginning (one
would expect a stray titlepage at the end); and its subject matter
appears related to that of the main work. Still, it does not seem
to belong, and falls somewhere between (2) and (3). Since (3) would
leave the text with three unexplained blank pages before the title page, I
went with (2), which leaves them blank but at least explains why.]
Source: notes file
Date: 4 Oct 2004
File name: S647.take2
Keywords: GAP
The full citation (bib record) for this text states that "The inserted leaf after 309 is a letter from the Emperor of Russia to Edward VI". This insertion interrupts the flow of the text around it so I have made it a <Q><TEXT><BODY>, with a DIV type "insertion", and then <LETTER> tags inside this (the letter is followed by some additional comments).
PFS: this seems reasonable; an alternative open to us, if the insertion is really random and has nowt to do with the book at all, is to omit the text altogether and insert a <GAP DESC="intruder">
Source: notes file
Date: 9 Feb 2004
File name: Wh1249aa
Keywords: GAP, catchword
Between p19 on image 11 and p20 on image 12, SPI/PDCC had inserted
<GAP DESC="MISSING" EXTENT="1 page"> because the catchwords don't match.
However, the page numbers run on, and while there seems to be some words
missing, this might be a printer error rather than an instance of missing
pages. I've taken the GAP out for this reason. I have done the same
between images 66 and 67.
You (Paul) might disagree though?
PFS: I agree that the text is discontinuous, and that the missing bit seems unlikely to be two full pages. I inserted <GAP DESC="missing" EXTENT="1 span"> as a compromise measure that indicates that we (or rather SPI!) have noticed something missing, but which does not go so far as to claim that a pair of pages is missing.
Source: notes file
Date: 30 May 2003
File name: S8158
Keywords: GAP
At the top of the page there are three spaces into which names and places have been handwritten (presumably at the time of issue). I have inserted 3 missing word gaps - is this ok?
PFS: a philosophical question! Are the blanks placed deliberately in form letters different in kind from other kinds of 'missing' GAPs?
Some options:
(1) ␣ but really can't use this (the ISOpub entity for 'significant
blank symbol'); perhaps &leftblank;? --or:
(2) <GAP DESC="missing" REASON="left blank" EXTENT="1 word"> --or:
(3) <GAP DESC="blank" EXTENT="1 word"> (i.e. invent new attribute token)
I've gone with (3) for the moment, till a better thought comes.
Source: notes file
Date: 6 May or 5 Jun 2004
File name: Ws5504
Keywords: punctuation, GAP
Long dashes (on pages 43, 83, 84) captured as <GAP DESC="BLANK">. Changed to —.
PFS: These are 'ellipsis' dashes, of the sort, "I learned much from Mr. B----". Actual example:
<HEAD>Vpon — his Picture Prefixt to his Almanack.</HEAD>
Though these certainly do represent bits that are deliberately 'left blank', we've tended, as Amanda indicates, to treat these as dashes rather than as blanks, the latter being reserved for blanks intended to be filled in, as in a legal form or the facsimile of a legal form.
Source: notes file
Date: 2005-01-14
File name: pdcc/Wt1813
Keywords: GAP
Wasn't sure what to do with this: some of the sense is missing, but I don't think any of the original printed book is missing. It looks more like the material wasn't printed. I put a <GAP> tag in, but should I have done?
<PB N="16" REF="19">
[...]
<SP>
<SPEAKER>T.</SPEAKER>
<P><HI>Repeating it over again after him, said</HI>,</P>
</SP>
<GAP DESC="MISSING" EXTENT="1 paragraph">
<SP>
<PB N="17" REF="19">
<SPEAKER><HI>C.</HI></SPEAKER>
<P>Which he took thus away; That which proves the
thing denied, is sufficient; But that Subcontrary propositions
in a Contingent matter may be both true, proves the thing de|nied,
that some infants may not be Baptized, some infants may be Baptized;
Therefore it is sufficient.</P>
</SP>
PFS: well, there is certainly something missing, and it is
probably less than a page, and probably a printer's error.
If we're willing to use <GAP> to indicate missing text at all
(which is a slight misuse to begin with--it's meant to indicate
text that is there but which we have left untranscribed), we
might as well extend it to text missing because of accidents
other than lost pages. We could include a REASON if that
is any solace:
<GAP DESC="missing" EXTENT="1 paragraph" REASON="omitted in print">