Matters Miscellaneous

  1. Superscripts

Question: In the advertisement at the back of this text what I assume is the book format is given as a number and what looks like a degree symbol (ie 4^o for quarto, 8^o for octavo). Apex have captured them as 4&deg. etc. Should these be changed to 4^0 or is there something else I should do?

We've been changing these to 4^o, 8^o (that is, 4 and superscripted lower-case "o"), since the "o" is really just the final "o" of "quarto", etc. We've done the same in the case of regnal years (e.g. "3^o Hen.8" = third year of Henry VIII). In both cases, vendors have been been known to use °. ° may not be wholly wrong, since one could imagine that the origin of the "degree" sign is none other than a superscripted "o"; but we have, perhaps foolishly, tried to reserve "°" for
uses which modern usage makes us think more appropriate, degrees of latitude or measures of angles, for example.

Question: In this text, there are two lines with numbers above certain words, which has been captured as:

<L>1 2 3 1 2 3</L>
<L>For Streams, Iuice, Balm they are, which que~ch, kils, charms,</L>
<L>1 2 3 1 2 3</L>
<L>Of GOD, Death, Hell, the wrath, the life, the harmes.</L>

How ought they to be captured?

In the past we've resorted to capturing these as simple superscripts, as follows:

<L>For Streams^1, Iuice^2, Balm^3 they are, which que~ch^1, kils^2,
charms^3,</L>
<L>Of GOD^1, Death^2, Hell^3, the wrath^1, the life^2, the harmes^3.</L>

This is a pretty simple example (they can get pretty fancy), but in every case the numbers are intended to link separated words, sometimes so as to
provide alternative reading sequences. Here they seem simply to identify words that should be taken together even if not read together: streams
quench, juice kils, balm charms; wrath of God, life of death, harmes of hell. There are some fancy pointer or CORRESP things that one could do in
TEI, but we have thought simple superscripts served.

Back to top


2. Clarifying UNCLEAR

When and if you feel driven to use the UNCLEAR tag (e.g. to capture some text that would be usefully searched but which you cannot be even reasonably sure is really there), please do not use it around individual letters or strings within words--just as with <HI>. A single word should be the minimum contents of <UNCLEAR>. Otherwise, the word becomes unsearchable, which defeats the greater part of the purpose in using the tag at all.

Back to top



3. Long or short lines in verse

Question: In one collection of verse, indented lines were treated as part of the line before, making verses four lines long rather than eight.  This meant that when there was an extra word in brackets to be put at the end of the line, it was put at the end of the indented line.
 eg.

                                      (Larks,
And then with Woodcocks and with
        she must rise up and dine:

 became:

And then with Woodcocks and with she must rise up and dine: Larks,

Instead of:

And then with Woodcocks and with Larks, she must rise up and dine:

 I corrected these, but wasn't sure about putting additional  <L>s in.  The rhyme scheme indicates four line verses, and the commas at the end of the lines could be a caesura, rather than a real end of line. However, in the past I think we have put a new <l> whenever there has been a line break.

 We've generally tried to take into account both rhyme scheme and capitalization (sometimes even recourse to a modern edition) in deciding whether to capture as short lines or as long lines with caesura. In this case, the capitalization and rhyme scheme supportlong lines and should probably override the existence of the physical line break as an indicator of metre. We could capture the line above ("And ... dine") as a single verse line (<L>).  If this involves changing the existing tagging, note that often to a large extent the change can be automated, making use of the case of the first letter of the second half-line. E.g., replace "</L>\n<L>\([a-z]\) with " \1"
 There will of course be times when the case is too ambiguous, or the change too inconvenient (or both) to justify doing anything but leave in place what has been done.

Back to top



4. Abbreviations: entities and more

  Question: Does using the entity &abis; mean we are sure the word expands to -is rather than -es?In this particular case that would expand the word to "subiectis".

 The symbol represented by &abis;  may better be thought of as having several possible expansions, including -is, -es, maybe even -s (though in the last case why not just use "s"?), and may have been used as a plural marker without necessarily even considering  how it would be expanded if spelt out.
   .
Similarly  the "-er-" symbol may often be expanded as "-re-",

&abper; can often more rightly be expanded to "par" than to "per",
&quod; can be expanded as "quoth" as well as "quod", etc.

Question: What should I do with the symbol "ll" (double-ell) meaning "pounds"?

Rather than invent a new entity, follow the usual rules and capture it as "ll" within abbreviation tags, i.e. <ABBR>ll</ABBR>.

Topic: "h" and "d" in Caxton

We've adopted the following rather brutal strategy for dealing with the dual forms of "h" in some Caxton books, one of which has a squiggle over it, and which the vendors have (following directions) treated either as h~ or placed the whole word in <ABBR> tags. The trouble is that none (or virtually none) of the words
with this form of h seems actually to be an abbreviation of any kind. E.g. h~im for him, th~is for this, <ABBR>knightes</ABBR> for knightes. So we've decided (unless someone can prove otherwise) that the strokes on these funny "h"s should be regarded as superfluous ('otiose strokes' in paleography talk) and insignificant.
On the other hand, there are a LOT of legitimate <ABBR>s in these files, mostly arising from the final 'd' with curlicue that stands, supposedly, for final -e. Our solution was to exclude the abbreviations ending in -d, then remove all the <ABBR> tags from words containing 'h', as well as removing ~ following h. Like this:

(1) replace

  h~

with h

(2) replace

d\([. ,]*\)</ABBR>

with

d\1</ABBR}

(3) replace

<aBBR>\([^<]*h[^<]*\)</ABBR>

with

\1

(4)

replace </ABBR}

with  </ABBR>
 
 

Back to top



 

5. Tagging "Explicit"

We do not see many "explicits"; those that we do, we have been tagging as <TRAILER> (rather than as <CLOSER>, <P>, or <STAGE>, among others),
because they typically have a HEAD-like character. <TRAILER> more or less corresponds to <HEAD> just as <CLOSER> corresponds to <OPENER>. E.g.:

<HEAD>Hymnus.</HEAD> ....   <TRAILER>Explicit hymnus</TRAILER>

<HEAD>Gloss.</HEAD> .... <TRAILER>Explicit comment.</TRAILER>

<HEAD>Here beginneth the Commentary of <HI>Island.</HI></HEAD> ...
<TRAILER>Here endeth the first part of the Commentarie.</TRAILER>

<HEAD>&para; Here bynnen the chapytours of the .ii. boke / </HEAD> ...
<TRAILER>&para; Here endeth the chapytours of y^e seconde boke.</TRAILER>
 

 <HEAD>Here beginnith the boke of the libelle of English policie</HEAD> ...
<TRAILER>Explicit libellus de Politia conseruatiua maris.</TRAILER>

<HEAD>Actus secundus, Scena prima.</HEAD> ...
<TRAILER>Explicit Actus secundus.</TRAILER>

Back to top


6. Editing tables


Question:  Is there a way to facilitate editing tables?

Paul replies: I'm thinking about whether we can somehow use the true HTML table model within our dtd (instead of the TEI model). TEI and HTML are readily
convertible at this point. Changing the vendors' dtd would be a nuisance; changing the data already delivered would be a nuisance. One option might be to add a
add a third dtd for reviewers that uses the HTML model, and convert back into eebo2prf.dtd as part of the delivery process.

Less satisfactory, but much easier, would be a two-way temporary conversion that would allow people to edit tables in HTML, then convert them back to TEI
for reinsertion in the document.

I've had a go at producing this, as follows:

  (a) a batch file tab2htm.bat that uses a perl script tab2htm.pl to change all files named table*.sgm to table*.html

  (b) a batch file tab2sgm.bat that uses a perl script tab2sgm.pl to change all files named table*.html to table*.sgm

  (c) to be used thus: having a file with a nasty table,  cut the table out of the file and paste it into a new  file. Leave a marker in the original file to show  where the table belongs (e.g.: {table1}). Name the  new file something similar (e.g. table1.sgm).  Type tab2htm at the DOS prompt to turn it into  an HTML file (e.g. table1.sgm.html).  *Edit the HTML file* in textpad. Right-click periodically  and select "display in browser" to see how your changes  affect the layout of the table. When it looks ok, save the  HTML file. Then type tab2sgm at the DOS prompt.  Open the resultant file (e.g. table1.sgm.html.sgm)  Cut-and-paste the table back into the original file  in place of the marker {table1}.

  It's a bit clumsy, but I just used it to fix two tables, so   it can work.

Question: There are at least some, and maybe many, bad tables in the texts already online--bad in the sense that they
display incorrectly and even unintelligibly. How can we fix these and how can we prevent more bad tables from joining them?

Paul again: Let me know when you spot one. I'll try to fix it. As for preventing them, all I can think of is to make it easier to view and edit tables while doing text review.

Back to top


7. Acrostic poem

Encountering an acrostic poem in which the first letter of each line is set off from the rest of its word in a column to itself, in order to emphasize the acrostic character of the poem, we tagged the first letter separately with <HI> (since it was clearly being highlighted by the columnar arrangement).

<DIV1 TYPE="acrostic poem">
<PB REF="1" MS="y">
<HEAD> ... EDVARDO HIDE ... CARMEN GRATVLATORIVM. an. 1660</HEAD>
<DIV2 TYPE="version" LANG="lat">
<LG>
<L><HI>E</HI>rige depressam Victrix Academia cristam,</L>
<L><HI>D</HI>ii Tutelares rem Tibi nuper agunt:</L>
<L><HI>V</HI>enit lo extorris (praeclarior ostracismo)</L>
<L><HI>A</HI>nchora quassat&acirc; deficiente rati:</L>
<L><HI>R</HI>egibus ante omnes, Patri, Natoque fidelis</L>
<L><HI>D</HI>extera; captivo deliciae populo:</L>
<L><HI>V</HI>irtut&igrave;sque supremus apex: hunc <HI>Anglia</HI>
              prim&ugrave;m</L>
<L><HI>S</HI>uscipit in gremium, jam capit <HI>Oxonium</HI>.</L>
</LG>
<LG>
<L><HI>H</HI>erculeus labor! at languens Ecclesia poscit,</L>
<L><HI>I</HI>nque Tuas gestit tradere sacra manus:</L>
<L><HI>D</HI>urius incumbens humeris onus utile Nobis</L>
<L><HI>E</HI>xpulsos revocas, restituisque Deos.</L>
</LG>
<LG>
<L><HI>C</HI>os Musis, cui terra minax, pelagusque pepercit</L>
<L><HI>A</HI>nnue, dans votis vela secunda meis:</L>
<L><HI>N</HI>utriat admissos (sit anus lic&egrave;t) ubere pleno</L>
<L><HI>C</HI>asta parens olim, nunc tibi filiola.</L>
<L><HI>E</HI>numerans Mecaenates sis &igrave;pse patronus</L>
<L><HI>L</HI>iber, ut excellens Bibliotheca crepat:</L>
<L><HI>L</HI>eg&igrave;s-latores auge, jurisque peritos,</L>
<L><HI>A</HI>uthores nullo cod&igrave;ce deficiant:</L>
<L><HI>R</HI>edde ant&igrave;qua, &amp; er&igrave;s Nob&igrave;s Tu Stator
               Apollo;</L>
<L><HI>(J</HI>upiter hoc famam nomine nactus erat.)</L>
<L><HI>V</HI>i cum serus ob&igrave;s post Te monumenta relinquens</L>
<L><HI>S</HI>umptibus hic cedant <HI>Seldeniana</HI> Tuis.</L>
</LG>
</DIV2>

Back to top


8. "Spoken by..." in plays

Question: Should "spoken by..." in plays be tagged as HEAD, STAGE, or BYLINE?

(1) Use <HEAD> as an acceptable fallback, especially if the 'spoken by...' portion is not typographically distinct from the rest of the <HEAD>; and (2) if there is reason (e.g., separate line or other visual distinction) to tag the line separately, then we prefer <STAGE> to <BYLINE>.

Since 'spoken by' lines attribute performance, rather than authorship, we've always been dubious about tagging them as BYLINE (though responsibility for performance is analagous to responsibility for authorship and that there are things to be said in favor of both. )

Back to top


9. Letters for rubricator

Question: In this text  there are indented lines with a corresponding letter next to it (i.e. y Zores...).  Should we capture it as note or should we treat them as <l>?

These little letters are instructions to a rubricator to come along and add hand-drawn large initials in the space--which was never done. Accordingly, they should
be treated as if they *were* the large 'drop caps' that belong there, and included as part of the first word on the first indented line. E.g., on image 21, left column,

  <L>nYchole le mostardier</L>
  <L>A bon vinaigre</L>
 
which matches this in the right hand column:

  <L>nYcholas the mustardmaker.</L>
  <L>Hath <aBBR>good</ABBR> vynegre</L>
 
Likewise, further down the same page, we find

  <L>oLiuier ...  and <L>oLyuer ...
 
and on the right hand page

  <L>pyere le bateure de lame</L>
 
corresponding to the English

  <L>pEter the betar of wulle</L>
 


Back to top