Characters
1. Z and yogh in Scottish texts
Question: In this Scottish text, "z" stands for historic "yogh" (and
is represented in some modern editions as "y"). How would we render it?
Though our general rule is to leave letters as printed even when the printed
letter is a merger of two historically distinct letters (thus we print "ye"
for both historic "ye" and historic "þe", since both came to be
printed "ye"), we have made something of an exception with regard to yoghs
in Scottish books--"something" of an exception because it is not clear that
it is really an exception, since one can often make out a real physical
distinction between the letter used for zed and the letter
used for yogh.
In the book under discussion, one finds no cases of zed to compare, but
based on a list of words in the book with zed, we should change the
zeds to yoghs.
Examples:
# zit
# zour
# zour
# zour
# zour
# zour
# zour
# Zour
# Zour
# Zit
# feinze
# ze
# zour
# fenzeit
# Zow
# ze
# vailzeand
We have different character entities for upper- and lower-case yogh, namely
&YOGH; and &yogh; respectively. So we should change z to &yogh;
and Z to &YOGH;.
2. Other uses of z
Lower-case z is one of the most difficult characters in early printing,
difficult because it functions much more ambiguously than any other. Precisely
because it is so ambiguous, we have not asked the vendors to distinguish
between more than a few of its uses, those in which context makes the meaning
clear.
o The basic character is 'z'
Many of the examples are in fact simply examples of 'z'.
We would certainly capture the character as 'z' in (example 8; see below
for context) zodiaco; (exx. 6 and 7) ziniar; and (ex. 5) Azoubor.
o The same piece of type is used as a form of final '-m'
In example 2 (below) 'hoi~ez' is actually 'hoi~em', an
abbreviation for 'hominem'; in ex. 4 'oe~z' is actually 'oe~m', an
abbreviation for 'omnem'; and in example 3 'ai~az' is actually 'ai~am',
an abbreviation for 'animam'.
**The question is: how should the vendor capture
these?
In the past, we have assumed that they would
capture them
as 'z' and leave it to us to change some of them
to 'm'
during review. This still seems the most reasonable
suggestion.**
o The same piece of type is used as an abbreviation marker, especially
in particular words and contexts, a couple of which survive into modern English
(like 'oz.' for 'ounces' and 'viz.' for 'videlicet').
- In a couple of cases where the context is predictable,
we've provided special entities for these abbreviations, e.g.:
-qz (abbrev. for -que) -->
capture as &abque;
-bz (abbrev. for -bus) -->
capture as &abbus;
- Otherwise, we are content to leave these as
'z'. Two examples are (1 and 3) 'sz' and (ex. 4) 'scz' (both of which
are abbreviations for 'scilicet'). So:
scz (abbrev. for 'scilicet') -->
capture as scz
sz (abbrev. for 'scilicet'
etc.) --> capture as sz
viz (abbrev. for 'videlicet') -->
capture as viz
o The same piece of type can mean some other things too, especially
in Scottish books (see above).
Our conclusion is that vendors should capture all of these as 'z' except
the few cases (&abbus;, &abque;) for which we have provided special
entities for the abbreviations in question. In-house, we will continue to
convert some of these to '-m'.
Transcriptions of examples (with a little surrounding context). Cases in
which the -z is actually a form of -m are indicated by [m] in brackets.
----------------------------------------------------------------------
Example 1:
quasi forrilamina est. sz. viridis habet maculas albas.
Example 2:
mouent hoi~ez[m] ad audaciam ita &abquod; non timet morte~
Example 3:
&abquod; de+siderat & diu no~ credidi <aBBR>illd</ABBR>
Sz
post&abquam; legi libros
q~ &abquam;uis faciat <aBBR>illd</ABBR> r~putani&abus;
tn~ ai~az[m]
facere / ia~ publicu~
Example 4:
ad efficacia~ ad oe~z[m] re~ &abquod; vult homo
& sunt <aBBR>secund</ABBR> duos modos / vno scz iam
dicto.
scz <aBBR>scdm</ABBR> affectione~
Example 5:
Et quando submergis Azoubor moritur. & roretur super ipsum
acetum: vinificatur Et
Example 6:
super ipsum aliquid ex ziniar. deinde funda super illud ex illa
pinguedine liquefacta
Example 7:
imagines & la+pides: accipe ziniar & tere bene. &
accipe
pannum funeris
Example 8:
locu~ folis & lune in zodiaco quolibet die:
3. I/J
We do *NOT* globally change upper-case J to I. That suggestion was
removed from the keying instructions very early on--years ago--, to be replaced
with the more sensible rule that says: record "I" as "I" and "J" as "J";
but if they are formally indistinguishable, use "I".
Question: There was an I/J issue in my last
text (Ws2599, vid 53988). However, I could only tell the "I"s were meant to be Js because
this text happened to have an italicized true
I which I happened to see. The same character is used interchangeably in other texts. Are we
now meant to try to find out how the I/J
character is being used in each text and correct
accordingly? I didn't correct them in this case, and I think here in Oxford we favour keeping them as I. Don't want
to cause a transatlantic rift in keying practice
though.
There are two issues: one is what the keyers should be doing (and how we
should keep them to it). The other is what we should do in the way of correction
if they get it wrong.
For the former, we don't see that I/J is any different from
any other character recognition problem: that is, we expect the keyers to
honour the distinctions made by the typeface(s) used in the book in hand,
regardless of how the form is used elsewhere. Only if there is no local distinction,
or if the distinction is murky or hard to discover (e.g., the true "I" or
true "J" is rare), should they be content to drop down to the "everything
is I" fallback strategy.
As for the latter, we hope we can persuade the vendors to
get it right every time, and make this just a temporary issue. In the meantime,
the suggested approach has depended much on the experience of a particular
book that you gain from proofing. For example, we've favored a more or less
global change of I[vowel] to J[vowel] (after viewing a list of the words
affected), only if during proofing it becomes clear that (1) the book uses
only typefaces that maintain the I/J distinction; (2) the book uses modern
conventions for distributing I and J (vowel/glide); and (3) the keyers have
got it thoroughly wrong. The global approach (or semi-global, done by
turning a list of affected words into a set of perl substitutions) is the
only one that I'd recommend in most cases, since anything else is usually
incredibly time-consuming.
Which is not to say that some repair is not possible even if all these conditions
do not hold; e.g., if the keyers get it wrong only in italics, it may be
possible to isolate the cases that occur in italics; if the book is very
small, it may be possible to repair all the examples individually. Worst
are the cases in which a book has several typefaces, one of which has no
I/J distinction (e.g. a blackletter), and one of which does (e.g. a roman),
but the keyers have ignored it. We've left those as
they are, since the only way to fix them would be to go case by case.
You can find some typical examples from three books (one done by each firm)
on the web at
http://www.lib.umich.edu/eebo/docs/dox/ij.html
Question: Should we be trying to
find examples of I and J in every book
and comparing them to check that the vendor has done this correctly? We may not notice any problem when proofreading,
unless examples of I and J happen to occur close
together on the page.
The problem is largely confined to italic; italic J is *always or almost
always different from italic I.* (It's only in blackletter that they merge.)
So if you're proofing along and find "Iustice" where the book *appears* to
say "Justice", the odds are that there is a problem, and that it really does
say "Justice". Takes about
five seconds (say, by searching for <HI>I[^aeiou]) to confirm that
there is in fact a distinct form for "I". So the issue is not whether to
take the trouble to check ('cause it's really no trouble), but whether we'd
rather know or avert our gaze! We'd rather know. Then we can at least proceed
with our eyes open.
Once we know, then things admittedly get a bit dicier. In a typical example,
it took the aforementioned five seconds (well, ok, ten) to see that there
was a problem--all one had to do is search for <HI>I[aeiou] and <HI>I[^aeiou]
and see if the forms
were different--they were. But it took about 5 minutes looking at a dozen
examples to confirm that the book appears to be consistent in its modern
I/J practice. And about 30 seconds topull a list of words beginning with
I[vowel], to see if all ofthem would be J-initial in modern practice. They
probably would:
Iesus
Iews
Iohn
Iu|stice
Iudas
Iudg
Iudg|ment
Iudge
Iudge|ment
Iudgement
Iudgment
Iupiter
Iust
Iustice
Iustif
Iustif$|ing
Iustification
Iustinus
If we did a global change, these are the words that would be affected. But
then what? Assuming that one cannot check each one, how do we feel
about doing that approximate global change?, i.e. (e.g.):
change \([> ]\)I\([aeiouy]\) to \1J\2 and
^I\([aeiouy]\) to J\1
This would produce (in this book) 79 changes, would change all the
words listed above to "J-"; and would thereby change most (not necessarily
all) "I"s into the correct "J"s, at the risk of changing some that are really
printed as "I" and also at the risk of not changing some that are really
printed as "J" but didn't get caught by the change. This inexact result could
arguably be considered an improvement.
4. Illegibilities
To re-cap our policy about capturing illegible words, letters, lines, etc.:
Use:
$ ($$ $$$ etc.) -- when individual letters are obscured, but at least
a rough count can be obtained. We would rarely use this for more than (say)
six letters. Above 6 letters or so we tend to use either:
$word$ -- for cases in which at least a rough estimate of words can be obtained
(again, not more than six or so); or
$span$ -- for cases in which no count of words can readily be obtained, either
because there are a lot of illegible words (i.e. six or so or more), or because
the passage is so damaged that a word count is not possible.
In practice, these three markers -- bare $, $word$, and $span$ -- represent
better than 90% of the illegibility that we find.
More rarely, a whole line or two of text is obscured, in which case we would
use $line$ (this happens most often at the top or bottom of a page, where
lines tend to get cropped off); or an entire $page$ or even a single $para$
[paragraph].