Characters



1. Z and yogh in Scottish texts

Question: In this Scottish text, "z" stands for historic "yogh" (and is represented in some modern editions as "y"). How would we render it?

Though our general rule is to leave letters as printed even when the printed letter is a merger of two historically distinct letters (thus we print "ye" for both historic "ye" and historic "þe", since both came to be printed "ye"), we have made something of an exception with regard to yoghs in Scottish books--"something" of an exception because it is not clear that it is really an exception, since one can often make out a real physical distinction between the letter used for zed and the letter
used for yogh.

In the book under discussion, one finds no cases of zed to compare, but based on a list of words in the book with zed, we should  change the zeds to yoghs.
Examples:
#  zit
#  zour
#  zour
#  zour
#  zour
#  zour
#  zour
#  Zour
#  Zour
#  Zit
#  feinze
#  ze
#  zour
#  fenzeit
#  Zow
#  ze
#  vailzeand

We have different character entities for upper- and lower-case yogh, namely &YOGH; and &yogh; respectively. So we should change z to &yogh; and Z to &YOGH;.

Back to top



2. Other uses of z

Lower-case z is one of the most difficult characters in early printing, difficult because it functions much more ambiguously than any other. Precisely because it is so ambiguous, we have not asked the vendors to distinguish between more than a few of its uses, those in which context makes the meaning clear.

o The basic character is 'z'
  
    Many of the examples are in fact simply examples of 'z'. We would certainly capture the character as 'z' in (example 8; see below for context) zodiaco; (exx. 6 and 7) ziniar; and  (ex. 5) Azoubor.
    
o The same piece of type is used as a form of final '-m'
  
    In example 2 (below) 'hoi~ez' is actually 'hoi~em', an abbreviation for 'hominem';  in ex. 4 'oe~z' is actually 'oe~m', an abbreviation for 'omnem'; and  in example 3 'ai~az' is actually 'ai~am', an abbreviation for 'animam'.
  
      **The question is: how should the vendor capture these?
       In the past, we have assumed that they would capture them
       as 'z' and leave it to us to change some of them to 'm'
       during review. This still seems the most reasonable suggestion.**
      
o The same piece of type is used as an abbreviation marker, especially in particular words and contexts, a couple of which survive into modern English (like 'oz.' for 'ounces' and 'viz.' for 'videlicet').
  
     -  In a couple of cases where the context is predictable, we've provided special entities for these abbreviations, e.g.:
          
          -qz  (abbrev. for -que) -->  capture as &abque;
          -bz  (abbrev. for -bus) -->  capture as &abbus;

     -  Otherwise, we are content to leave these as 'z'. Two examples are (1 and 3) 'sz' and  (ex. 4) 'scz' (both of which are abbreviations for 'scilicet'). So:
        
          scz (abbrev. for 'scilicet') --> capture as scz
          sz  (abbrev. for 'scilicet' etc.) --> capture as sz
          viz (abbrev. for 'videlicet') --> capture as viz
          
o The same piece of type can mean some other things too, especially
  in Scottish books (see above).

  
Our conclusion is that vendors should capture all of these as 'z' except the few cases (&abbus;, &abque;) for which we have provided special entities for the abbreviations in question. In-house, we will continue to convert some of these to '-m'.

Transcriptions of examples (with a little surrounding context). Cases in which the -z is actually a form of -m are indicated by [m] in brackets.
----------------------------------------------------------------------

Example 1:
quasi forrilamina est. sz. viridis habet maculas albas.

Example 2:
mouent hoi~ez[m] ad audaciam ita &abquod; non timet morte~

Example 3:
&abquod; de+siderat &amp; diu no~ credidi <aBBR>illd</ABBR> Sz
   post&abquam; legi libros
q~ &abquam;uis faciat <aBBR>illd</ABBR> r~putani&abus; tn~ ai~az[m]
   facere / ia~ publicu~

Example 4:
ad efficacia~ ad oe~z[m] re~ &abquod; vult homo
&amp; sunt <aBBR>secund</ABBR> duos modos / vno scz iam dicto.
  scz <aBBR>scdm</ABBR> affectione~

Example 5:
Et quando submergis Azoubor moritur. &amp; roretur super ipsum
  acetum: vinificatur Et

Example 6:
super ipsum aliquid ex ziniar. deinde funda super illud ex illa
  pinguedine liquefacta

Example 7:
imagines &amp; la+pides: accipe ziniar &amp; tere bene. &amp; accipe
  pannum funeris

Example 8:
locu~ folis &amp; lune in zodiaco quolibet die:

Back to top



3. I/J

We do *NOT* globally change upper-case J to I. That suggestion was removed from the keying instructions very early on--years ago--, to be replaced with the more sensible rule that says: record "I" as "I" and "J" as "J"; but if  they are formally indistinguishable, use "I".

Question: There was an I/J issue in my last text (Ws2599, vid 53988).  However, I could only tell the "I"s were meant to be Js  because this text happened to have an italicized true I which I happened to see.  The  same character is used interchangeably in other texts. Are we now meant to try to  find out how the I/J character is being used in each text and correct accordingly?  I  didn't correct them in this case, and I think here in Oxford we favour keeping them as I. Don't want to cause a transatlantic rift in keying practice though.


There are two issues: one is what the keyers should be doing (and how we should keep them to it). The other is what we should do in the way of correction if they get it wrong.

For the former, we don't see that I/J is any different from any other character recognition problem: that is, we expect the keyers to honour the distinctions made by the typeface(s) used in the book in hand, regardless of how the form is used elsewhere. Only if there is no local distinction, or if the distinction is murky or hard to discover (e.g., the true "I" or true "J" is rare), should they be content to drop down to the "everything is I" fallback strategy.

As for the latter, we hope we can persuade the vendors to get it right every time, and make this just a temporary issue. In the meantime, the suggested approach has depended much on the experience of a particular book that you gain from proofing. For example, we've favored a more or less global change of I[vowel] to J[vowel] (after viewing a list of the words affected), only if during proofing it becomes clear that (1) the book uses only typefaces that maintain the I/J distinction; (2) the book uses modern conventions for distributing I and J (vowel/glide); and (3) the keyers have got it thoroughly wrong. The global approach (or semi-global, done by
turning a list of affected words into a set of perl substitutions) is the only one that I'd recommend in most cases, since anything else is usually incredibly time-consuming.

Which is not to say that some repair is not possible even if all these conditions do not hold; e.g., if the keyers get it wrong only in italics, it may be possible to isolate the cases that occur in italics; if the book is very small, it may be possible to repair all the examples individually. Worst are the cases in which a book has several typefaces, one of which has no I/J distinction (e.g. a blackletter), and one of which does (e.g. a roman), but the keyers have ignored it. We've left those as
they are, since the only way to fix them would be to go case by case.

You can find some typical examples from three books (one done by each firm) on the web at

   http://www.lib.umich.edu/eebo/docs/dox/ij.html

Question:   Should we be trying to find examples of I and J in every book and comparing them to check that the vendor has done this correctly? We may not notice any problem when proofreading, unless examples of I and J happen to occur close together on the page.

The problem is largely confined to italic; italic J is *always or almost always different from italic I.* (It's only in blackletter that they merge.) So if you're proofing along and find "Iustice" where the book *appears* to say "Justice", the odds are that there is a problem, and that it really does say "Justice". Takes about
five seconds (say, by searching for <HI>I[^aeiou]) to confirm that there is in fact a distinct form for "I". So the issue is not whether to take the trouble to check ('cause it's really no trouble), but whether we'd rather know or avert our gaze! We'd rather know. Then we can at least proceed with our eyes open.

Once we know, then things admittedly get a bit dicier. In a typical example, it took the aforementioned five seconds (well, ok, ten) to see that there was a problem--all one had to do is search for <HI>I[aeiou] and <HI>I[^aeiou] and see if the forms
were different--they were. But it took about 5 minutes looking at a dozen examples to confirm that the book appears to be consistent in its modern I/J practice. And about 30 seconds topull a list of words beginning with I[vowel], to see if all ofthem would be J-initial in modern practice. They probably would:

Iesus
Iews
Iohn
Iu|stice
Iudas
Iudg
Iudg|ment
Iudge
Iudge|ment
Iudgement
Iudgment
Iupiter
Iust
Iustice
Iustif
Iustif$|ing
Iustification
Iustinus

If we did a global change, these are the words that would be affected. But then what? Assuming that one cannot check each one, how do we feel
about doing that approximate global change?, i.e. (e.g.):

change \([> ]\)I\([aeiouy]\) to \1J\2   and
       ^I\([aeiouy]\) to J\1

 This would produce (in this book) 79 changes, would change all the words listed above to "J-"; and would thereby change most (not necessarily all) "I"s into the correct "J"s, at the risk of changing some that are really printed as "I" and also at the risk of not changing some that are really printed as "J" but didn't get caught by the change. This inexact result could arguably be considered an improvement.

Back to top



4. Illegibilities

To re-cap our policy about capturing illegible words, letters, lines, etc.:

Use:

$ ($$ $$$ etc.) -- when individual letters are obscured, but at  least a rough count can be obtained. We would rarely use this for more than (say) six letters. Above 6 letters or so we tend to use either:

$word$ -- for cases in which at least a rough estimate of words can be obtained (again, not more than six or so); or

$span$ -- for cases in which no count of words can readily be obtained, either because there are a lot of illegible words (i.e. six or so or more), or because the passage is so damaged that a word count is not possible.

In practice, these three markers -- bare $, $word$, and $span$ -- represent better than 90% of the illegibility that we find.

More rarely, a whole line or two of text is obscured, in which case we would use $line$ (this happens most often at the top or bottom of a page, where lines tend to get cropped off); or an entire $page$ or even a single $para$ [paragraph].

Back to top