EEBO coding query #P15

Duplicate and missing pages

Re: STC 19352
Title: "An answvvere to the fifth part of Reportes lately sent forth by Syr Edvvard Cooke Knight, the Kinges Attorney generall"
By: Parsons, Robert, 1546-1610

Q. We have found that images 80 to 102 are just duplicates of actual pages 48 to 93. However, we noticed on the first copy that the pages 66 - 67, 78 - 79, and 86 - 87 are missing. Could we key <gap desc="duplicate pages" extent="46"> to replace duplicate pages 48 to 93? Are we correct to use pages 66-67, 78-79 and 86-87 from the duplicates?

A. Here's a general answer first:

If you notice that the same page has been filmed twice, you are encouraged to capture it only once, and capture the other copy with an empty <GAP> tag ("DESC="duplicate page").
There is no firm rule as to *which* copy to keep and which to <GAP> out, except that it would be sensible to keep the
better copy and exclude the worse one. Often it will be the second copy which is the better (it is because the photographers thought there might be something wrong with the first copy that they made a second copy).
If there is a duplicate run of images, and one is complete and the other incomplete, normally you should keep the complete set and exclude the incomplete set.
If the situation gets more complicated, e.g. if both sets are incomplete, but are missing different pages, or if only set is complete, but includes some bad images that can be replaced by images from the incomplete set, you may have to mix and match.
In any case, the desired result is : the best possible text, from the best images, in the right order.

Any images that are given the <GAP> treatment should, I think, be represented by separate <GAP> tags for each page (not each image), rather than attempt to represent a span of pages or a span of images with a single <GAP> tag. This is so that each image, regardless of whether it is captured or not, will still be represented in the text by a <PB> tag.
Each <PB> tag should, of course, point to the actual image number using the REF attribute.
In this case, it looks like the easiest thing to do is to record the FIRST (incomplete) copy using only <GAP> tags, and then capture the second (complete) copy as usual (or, if the first copy has already been keyed in, just renumber it as if it were the second copy).