DTD and image sets

Changes to SP, TRAILER, HEAD, and LIST
DTD changes for proofing
Muddled images
PDF with cropped images

Changes to SP TRAILER HEAD and LIST

Source: email
Date: 12 Nov 2003
Keywords: DTD

With 2003 speeding to a close, it's not too early to think about changes, fundamental or otherwise, that we might want to make for 2004. Tonya has been rationalising EEBOchar.ent so as to remove duplication with the ISO sets and to organize the entities by class. The dtd could probably use some improvement as well; several possible changes are on my list of suggestions , including:

Allowing Q and LETTER within SP. (already done locally)
Modeling TRAILER on HEAD (already done locally)
Adding L to TRAILER and HEAD (already done locally)

Adding HEADITEM and HEADLABEL elements

These are TEI elements designed to tag the heads of the columns in a two-column list, so saving us the need either to distribute the heading throughout the list or turn it into a TABLE. E.g.:

DRAMATIS PERSONAE

Character Actor

Mr. Blodgett............ Mark Sandler

Mrs. Blodgett........... Rina Kor

Their Cat, Sam.......... B. B. Schaffner

Cousin John............. Jonathan Blaney, Esq.

DRAMATIS PERSONAE
Character	Actor
Mr. Blodgett............	Mark Sandler
Mrs. Blodgett...........	Rina Kor
Their Cat, Sam..........	B. B. Schaffner
Cousin John.............	Jonathan Blaney, Esq.

would be tagged as:

<LIST>
<HEAD>DRAMATIS PERSONAE</HEAD>
<HEADLABEL>Character</HEADLABEL><HEADITEM>Actor</HEADITEM>
<LABEL>Mr. Blodgett</LABEL><ITEM>Mark Sandler</ITEM>
<LABEL>Mrs. Blodgett</LABEL><ITEM>Rina Kor</ITEM>
<LABEL>Their Cat, Sam</LABEL><ITEM>B. B. Schaffner</ITEM>
<LABEL>Cousin John</LABEL><ITEM>Jonathan Blaney, Esq.</ITEM>
</LIST>

Making the TARGET attribute of optional instead of required.

This would allow us to tag certain things, such as page numbers in tables of contents, appropriately, without committing us to linking.

Changing DTD for Proofing purposes

Source: email
Date: 6 May 2003
Keywords: DTD

One of Rina's books reminded me to point out to you that the dtd can help you find aberrations in the text for repair. It is perfectly permissible to make a copy in your working directory of the dtd (eebo2prf.dtd), give it a local name to avoid corrupting the main file (e.g. temp.dtd), make changes that will help you find problems, and reference this local dtd instead of the main one within your document by temporarily changing the DOCTYPE declaration into a SYSTEM (instead of a PUBLIC) one, e.g.

<!DOCTYPE ets SYSTEM "temp.dtd">

IN this case, PDCC was terminating some speeches at stage directions, and allowing the subsequent lines (<L>) of the speech to sit outside the <SP> tags--between speeches, simply within the <DIV>. By changing the local dtd to disallow L within DIV1 or DIV2, the validator would quickly find all the lines that so offended.

(Given our experience with the intransigence of XMetaL, this might be something safer tried only within the TextPad/NSGMLS editing duo.)

Muddled images

Source: email
Date: 28 June 2002
Filename: Ww3045a
Vid: 35932
Keywords: image set

Query The image set for this text is all mixed up. The title page is followed by pages 4 & 5, ie there are 2 pages missing. However, at the end of the text, the title page and first few pages are duplicated, including those that were missed first time round. Apex have inserted the missing pages in the correct position, with the correct PB REFs. At the end , therefore there are duplicate page gaps for each page except image 48, which now appears at the beginning of the text.

The PBs are now a little jumbled but this seems more helpful than recording the data at the end, amongst the duplicate page gaps. Does this seem ok?

Answer Yes. I've always supposed that our aim is to represent, so far as the images allow, the BOOK--even at the risk of representing a slightly idealized, "as intended," form of the book. So yes, we should be free to rearrange page images, even pick and choose the best images if we are faced with sets of duplicates, so as to create a coherent book in the original, intended sequence. This will sometimes create strange <PB> sequences:

<PB REF="1" N="1"><PB REF="32" N="2"> <PB REF="3" N="3"> etc.

PDF with cropped images

Source: email
Date: 11 Apr 2002
Vid: 14740
Keywords: image set

Query Do the vendors see exactly the same image that we do? On this image they have twice supplied characters which are simply not present because of the page being cropped. Marginal note 2 they added an f to o$ pleasure and at the bottom they gave a number 1, "Silius Ita$: lib 1. Virtus loqui|tur". [etc.]

Answer Aggh and again aggh. It has been our assumption that we look at the same images as they do. They actually look at the raw image (tiff) files; we look at the tiffs "wrapped" on the fly into a pdf file. But the source is the same and the images should be the same. HOWEVER, looking at your page online (which is actually a gif file generated automatically from the underlying tiff) and comparing the downloaded pdf, the .pdf file markedly crops the image, at least on the right margin (!).

Looking at the right margin of the online gif (< tiff), for example, I can read:

Virtus hominis
proprium bonum.
Tacitus lib: 4.
[pdf cuts off in middle of last m]

1. Moderation
of anger.

2. Contempt of
pleasure.
[pdf cuts off at right edge of "o"]

3. Abstinence
from covet$
ousnes
[pdf cuts off just after the "t"]

Silius Ita$: lib 1
[pdf cuts off after "b"]

Virtus loqui-
tur
[pdf cuts off at beginning of hyphen]

I really dont know what the long-term implications of this are. One thing that is clear is that if the vendors are seeing characters at the edge that you think are guesses, the odds are that they are really there. You may want to check the online image to check. It is possible that we may have to end up using the raw tiffs to proof by, downloading them from UMI just as the vendors do. I hope not.