Revision History
This symbol: links to sample pages that illustrate a given feature.
The data-conversion vendor will return keyed and coded text files transcribed from the page images.
Transcriptional accuracy will be 99.995% or better (error rate of 1 character/byte in 20,000). We will test and if necessary reject data by the shipment.
Coding will be valid SGML, validated against the supplied dtd or a true subset thereof. This dtd is an XML-compliant extract from TEI and uses TEI semantics; the TEI guidelines (TEI P3) may be safely used as a general guide to the meaning of particular tags.
The vendor may, at its discretion, reject as much as 10% of the books submitted for conversion if they are deemed impossible to convert accurately. Valid reasons for rejection (which must be stated) include: (1) excessive abbreviation, and (2) illegible text (due to poor image or print quality).
Changes. We recognize the need for consistency, and the expense entailed in changing instructions and procedures midstream; such changes will certainly be minimized. Nevertheless, there is certain to be unexpected material in the data; and there are certain to be unforeseen consequences to some of the instructions given here. These instructions, as well as the eebo_sgm.dtd will therefore undoubtedly undergo some revision during the course of this project; most of it, probably, towards the beginning.
Exceptions. There is considerable variety in the source material and minor special instructions may be required for some books, or some portions of books, in some cases overriding the instructions given below.
Feedback. Conversion firms involved in this project are encouraged to ask questions: both to inquire about specific features not anticipated by the Guidelines, and to challenge the Guidelines (or the dtd) if they seem to produce unreasonable results. We will likewise provide advice on the conversion firms' tagging practices as quickly as possible.
The beginning of each page (including the first page and all blank pages) should be recorded with a <PB> tag. The REF attribute of the <PB> tag is required: its value will eventually be the ID-string of the page image used by Bell & Howell to retrieve the image. Until such ID strings are available, simply record the image number within the book as the value of the REF attribute. E.g., a page appearing on the the third page image will begin with <PB REF="3">.
Since most of the page images are in fact images of page openings (i.e., each image contains two pages), in most cases there will be two <PB> tags for each REF value, like this:
<PB N="6" REF="3">
<PB N="7" REF="3">
<PB N="8" REF="4">
<PB N="9" REF="4">
<PB N="10" REF="5">
<PB N="11" REF="5">
The text captured from each book should be returned as a single file, *.sgm. For now use the method of file-naming specified on EEBO FAQ page, using the the Wing or Pollard & Redgrave STC number as the basis for the file name.
With a few standard exceptions noted below, the entire text will be recorded in its entirety, first page to last, in the order it was intended to be read (top left to bottom right, left column before right column, etc.).
The chief exception is parallel texts. Running parallel texts, printed in a multi-column, multi-row, or facing-page arrangement, or some combination thereof, need to be treated as separate texts (normally, separate <DIV>s, sometimes perhaps separate <TEXT>s), each one recorded until
its end and not restarted on each page. Notes and other material relating to only one of the texts on a page needs to be
embedded in that text, not in any of the others. If a single heading or figure applies to more than one of the parallel texts, it should be recorded at the appropriate place in each text to which it applies.
Partial or fragmentary parallel texts will normally be broken primarily at the chapter or section level (e.g. <DIV1 TYPE="chapter">), then into parallel versions of that chapter (e.g. <DIV2 TYPE="version">) when necessary. But full parallel texts, e.g. an entire Latin-English parallel New Testament, or a Latin-English parallel Boethius) will normally be broken primarily into versions first (<DIV1 TYPE="version">), then each version into its chapters (<DIV2 TYPE="chapter">).
All material should be recorded in the form in which it appears in the book: do not attempt to correct spelling or typographic errors (except upside-down letters; see below). Spaces between words should always consist of one space character. Spacing between words is, however, often highly irregular in these books, often difficult to discern, and therefore often requires a measure of judgment. This may involve advisedly departing from the spacing that appears in the original book when sense demands it.
Page numbers as printed in the book will be preserved only as the value of the "N" attribute of the <PB> (page-break) tag. Unnumbered pages should receive a <PB> tag with the N attribute omitted. Incorrect page numbers should be recorded just as they appear. Page numbers will usually consist of arabic or roman numerals, but may also appear as letters or letter-number combinations. If there appear to be multiple separate paginations, choose one to record with the <PB> tag; record the other with a <MILESTONE> tag. Ignore any typographic elements used to set off the page number. E.g. -2-, {p. 2}, and PAGE 2 should all be recorded as <PB N="2">; (ccii) .cc.ii. and -ccii- should all be recorded as <PB N="ccii">; etc.
Placement of <PB> tags. The rules are: (1) "pages always break at the top"; that is, <PB> tags will be inserted in the text at the actual location of the page break, regardless of the location of the page number on the printed page. (2) "Divisions begin at page breaks; they don't end there"; that is, if a structural break of some kind coincides with the page break (e.g., if a new section, paragraph, etc., begins at the head of the new page), the <PB> tag should be tucked inside the opening tag for the new division, neither inside the closing tag for the old division nor between the two divisions. And (3) "Words cannot break at page breaks"; that is, if a hyphenated word straddles a page break, finish the word and any attached punctuation, then insert the <PB> tag. Treat the hyphen as any other end-of-line hyphen.
In parallel texts, material on a single page is often recorded at widely separated points in the data stream (once in each parallel <DIV>). In that case, the <PB> tag, including the page number, should be repeated, i.e., recorded in both <DIV>s.
Foliation. Some books may be foliated instead of paginated, i.e., every leaf may receive a number, rather than every page (in which case, typically, the back page of each leaf has no number). Record a foliated book in the same way as a paginated book, supplying the folio number as the value of the "N" attribute of the <PB> tag. A typical page sequence in this kind of book will look like this:
<PB N="iij"> <PB> <PB N="iv"> <PB> <PB N="v"> <PB>The folio number may be explicitly labeled as such ("Fol.xvii." or "Folio .cxli."). Discard the label and punctuation and record only the actual number (<PB N="xvii"> <PB N="cxli">).
Other non-structural numerations and alternative numerations. If the book contains some other running numeration system alongside folio or page references, use the milestone element to record it, and use its form, recorded with the "rend" attribute, to distinguish it from other milestones. There is no need to interpret its meaning or decide on its "unit" value, unless it is clear what that is. For example, if a book contains an unexplained sequential series of numbers (perhaps in brackets) in its margins, record them as <MILESTONE N="">; if it contains an explained series of numbers in the margins, use the accompanying explanation as the "REND" value, insofar as that is practical: if an edition, for example, contains a series of sequential references in the margin that look like this: [Boeth., cap. 43], record them as milestones like this: <MILESTONE REND="Boeth." UNIT="cap." N="43">; if a book contains a series of sequential references in the margin that look like this: "*Chapter 4 in the Greek," record them like this: <MILESTONE REND="in the Greek" UNIT="chapter" N="4">. Note that this applies only to a sequence; occasional notes of this sort should be recorded simply as <NOTE>. If in doubt whether a set of numbers represents <MILESTONE>s or <NOTE>s, use <NOTE>. Some books contain conflicting structural enumerations, e.g. a system of proposition numbers in the margins that does not correspond with the chapter numbers; the former may be recorded using <MILESTONE> tags.
Line numbers in verse should be recorded only as the value of the "N" attribute of the <L> tag. Record in this fashion only line numbers actually printed in the book, and use the form of the number that appears in the book. (Line numbers in prose should usually be regarded as non-structural and recorded as milestones, as above.)
Stanza, chapter, section numbers, etc. (that is, sequential numbers that appear in the headings to <LG>s and numbered <DIV>s) should be included as they appear in the book as part of the text surrounded by the appropriate <HEAD> tag, but should also be recorded, if possible as an arabic number, as the value of the "N" attribute of the appropriate <DIV> or <LG> tag.
<DIV2 TYPE="chapter" N="5"><HEAD>Chapter V.</HEAD>
<LG N="14"><HEAD>Stanza XIV.</HEAD>
Paragraph numbers (sequential numbers appearing at the beginning of a series of paragraphs that you have not chosen to regard as <DIV>s) should be included as they appear in the book as part of the text surrounded by the <P> tags, but should also be recorded, if possible as an arabic number, as the value of the "N" attribute of the <P> tag.
Item numbers and label numbers in lists should be recorded as part of the text included within the <ITEM> (or <LABEL>) tags. They should not be recorded as attribute values. See below under "Lists and Tables."
Enumerations in tables may be variously treated: given a column of their own, left as part of the text in a row, or even made part of an embedded <LIST>, whichever adequately represents the information most efficiently. See below under "Lists and Tables."
Language. Supply a value for the LANG attribute of numbered <DIV>s and of whole <TEXT>s, but do so only if the bulk of the text (barring notes) in that <DIV> or in that <TEXT> is in the indicated language. Supply the attribute at the highest level at which it applies: e.g., if an entire text is in Latin, add LANG="lat" to the <TEXT> tag, but not to all the <DIV> tags within that <TEXT>; if one of the <DIV1>s in a text is in Latin and other is in English, assign LANG="lat" to one of the <DIV1>s and LANG="eng" to the other; and so on.
Assign multiple LANG values to the same <DIV> or <TEXT> only if it contains two or more languages in some kind of organized relationship. E.g., a bilingual Latin/English dictionary should be coded as <TEXT LANG="lat eng"> (with a space between the two codes). Supply a value for the LANG attribute only if you are sure what language it is; otherwise, do not use the attribute at all. Use USMARC 3-letter language codes published by the Library of Congress at http://lcweb.loc.gov/marc/languages/ (These are identical to the 3-letter codes contained in the ISO standard 639-2; see http://lcweb.loc.gov/standards/iso639-2/langhome.html) Do not attempt to differentiate between forms of the same language: e.g., record LANG="fre" for French texts and LANG="eng" for English ones, not LANG="frm" ('Middle French') or LANG="enm" ('Middle English').
TYPEs of DIV. Supply a value for the TYPE attribute of numbered <DIV> elements if the appropriate value is obvious; otherwise, omit the attribute entirely. If you do supply a value, use these rules:
If the designation in the book is a verbose version of a common English term, use the simpler form. E.g., if the book says "Prefatory Remarks by the Author," you shouldn't be afraid to translate this into <DIV1 TYPE="preface">
Otherwise, use whatever is there.
<DIV1 TYPE="poem"> <DIV1 TYPE="poem">See further under Poetry, below.
MS. Any page that contains handwritten corrections, deletions, glosses, etc., should have the "MS" attribute of the <PB> tag for that page set to "Y".
Provide other attribute values only when instructed to and when there is specific information to supply. Do not supply values of this sort: TYPE="unknown" or TYPE="unspecified".
Transcribe as I charge the<GAP DESC="damage">ame of our Lord Iesus [...] he shall come to <GAP DESC="damage">quick and the dea [...] thou peruse this copie<GAP DESC="damage">ligently cor@@ct it [...] that thou put too likewise<GAP DESC="damage">ge, and set it
Surrounding structures should be preserved. A line of verse quoted in Greek, for example, should be recorded as
<Q><L><GAP DESC="foreign"></L></Q>
Record as: the semicircle .18.5, <GAP DESC="foreign"> .21.7, <GAP DESC="foreign"> .23
One text or many? Most works will consist of a single <TEXT> containing a single <BODY> element (optionally also a <FRONT> and/or <BACK> element for front and back matter respectively). Some works will consist instead of a <GROUP> element that contains multiple <TEXT>s (each <TEXT> with its own <BODY> and, optionally, <FRONT> and <BACK>). The GROUP element will be used most frequently for items that contain several works published or bound together, each with its own title page, that were originally printed separately, e.g. the collected works of an author.
The <BODY> (and, if necessary, the <FRONT> and <BACK> elements) will normally be divided into numbered <DIV>s corresponding to the main divisions of the text. Very simple documents, on the other hand, with no internal division (a work consisting of a single poem, for example) do not require <DIV>s at all: use no more <DIV> layers than necessary.
The numbered <DIV> elements, from <DIV1> to <DIV7>, represent a hierarchy: the <BODY> is subdivided into <DIV1>s; <DIV1>s, if necessary, are subdivided into <DIV2>s, and so on. <DIV>s divide into parts: with few exceptions, you need to have more than one of something to call it a <DIV>.
Individual small texts embedded within a larger work (e.g. entire poems quoted within a chapter of a treatise) should usually not be tagged as <DIV>s but should instead be placed within <Q> tags. The <Q> element may if necessary contain an entire <TEXT>, with its own <BODY>, <FRONT>, <BACK>, numbered <DIV>s etc.
Useful clues to the DIV structure include:
In general, these are not sufficient to establish a <DIV> and should instead be recorded as ordinary text. Numbered paragraphs, for example, should simply retain the number as part of the paragraph (and as the value of the "N" attribute of the <P> tag), but there is no need to call the number a <HEAD> and therefore make the <P> a <DIV>.
<P N="3">¶ III. In the third place, the Calvinist partie striveth ...Marginal "headings" that you decide not to treat as <HEAD>s can usually be encoded either as <NOTE>s, with the PLACE attribute set to "marg" or (if they contain a sequential numeration), as <MILESTONE>s.
TYPES of DIVs. See above under "attributes."
Front matter (material to include in the <FRONT> element) typically includes title pages, dedications, tables of contents, prefaces, prologues, honorific poems, remarks "to the Reader", etc., each of which should be recorded with a numbered <DIV>, their subsections recorded with higher-numbered <DIV>s, etc.
Title pages do not require special tags. Each title page should be recorded as a numbered <DIV> within the <FRONT> element. Include both the front and back (recto and verso).
Back matter (material to include in the <BACK> element) typically includes indexes, glossaries, colophons, afterwords, appendices, etc., each of which should be recorded with a numbered <DIV>, their subsections recorded with higher-numbered <DIV>s, etc.
Do not attempt to record the physical appearance of the page (centering, extra spaces, justification, type face, type size, etc.), though such cues may and should be used to determine the beginning and end of divisions within the text, the distinction between text and notes, etc. On type faces, see the special instructions below about use of the <HI> tag.
Record line-breaks (with the <LB> tag) only (1) if the text is unintelligible without a break; and (2) if there is no intervening structural tag. Many times, it is better to repeat a tag than to insert a line break in the middle of one; but more often it is possible to get by without doing either, especially if there is any punctuation at the line break. E.g., record this:
CHAP. XI. Some Advantages and Helps for raising and affecting the Soul by Meditation.like this:
<HEAD>CHAP. XI.</HEAD> <HEAD>Some Advantages and Helps for raising and affecting the Soul by Meditation.</HEAD>or, better, like this:
<HEAD>CHAP. XI. Some Advantages and Helps for raising and affecting the Soul by Meditation.</HEAD>but NOT like this:
<HEAD>CHAP. XI. <LB>Some Advantages and Helps for raising and affecting the Soul by Meditation.</HEAD>
See below for the special case of prose interrupted by an interlinear gloss.
Paragraph breaks should be recorded with <p> in prose and with <lg> (line-group or stanza) in verse.
Do not record italic or bold type, the various kinds of black-letter ("gothic") typefaces, regular roman typefaces, or fonts of different sizes as such. Instead record every change from the predominant typeface with the <HI> tag, unless you use that change as a cue to insert a structural tag of some kind. For example, a book may have black-letter text and italic headings. Record the headings as <HEAD> ... </HEAD>, not as <HEAD><HI> ... </HI></HEAD>, since you have used the change to italic as a cue to tag the italic text as a <HEAD>
Predominance is established at the <DIV> level. E.g., if the Preface or Dedication or chapter or section (occupying its own <DIV>) is in italic, it needs no special tagging, even if the main body of the book is in some other typeface. But if an individual word, phrase, sentence, line, or paragraph is in some other face than that which is predominant in that <DIV>, then mark the "different" text with the <HI> tag.
The exception, of course, is again if you are using the change of type face as a cue to structural role: in a book that prints its text in roman and its notes or block quotations in italic, once you have recorded the italic text as a <NOTE> or a <Q>, you do not need to mark it also as <HI>. Instead, the italic type itself becomes the predominant form within the <NOTE> or within the <Q>; any changes of typeface within these tags (e.g. a single word in textura black-letter) should be recorded with <HI>
If the text switches to yet another typeface within a section flagged with <HI>, simply mark the new typeface with another (nested) <HI> tag.
The most common contrasting type forms may be described as: (1) roman; (2) italic; (3) textura; (4) rotunda; (5) bastarda (see letter samples below), but individual books may use other contrasting forms: subtypes of italic; changes of font size; etc. The general appearance of the book must be the key: if the book intends two kinds of type to contrast, then flag the change with <HI> (as instructed above).
EXCEPTION. Many books use a "diminuendo" effect both in headings and in the beginning of text divisions: for two or three lines of text, each line is smaller than the one above it, and is sometimes in a different typeface as well. This is simply decorative, and can usually be ignored; i.e., if this is clearly what is going on, do not code the lines of contrasting appearance with <HI>.
Do not record changes of typeface within a word (e.g. a single letter or two in another typeface within a word that is otherwise in the typeface used in the immediate context).
When punctuation coincides with the end of a span marked by the <HI> element, and there is doubt as to whether the punctuation belongs inside or outside the closing tag, place it within the closing </HI> tag:
<HI>Sillepsis,</HI> or the Double supply.
If two adjacent spans of text are in two different typefaces, both of which contrast with the predominant face as well as with each other, record the two spans with two separate sets of <HI>..</HI> tags.
Record superscripted and subscripted text using the keyboard "circumflex" or "caret" character (^ = DECIMAL 94, HEX 5E) before each superscripted character (^a;, ^b;) and the same "caret" character doubled (^^) before each subscripted character (i.e., ^^a;, ^^b;, etc.).
Record ornamented capitals, large drop caps, etc., as ordinary capital letters.
Record "small caps" as ordinary capital letters.
Record vertical text (text printed perpendicularly to the main text) as if it were horizontal.
<Q>s are used for block quotations, whether of prose or verse. Don't use them for ordinary "inline quotations."
"Block quotations" include both quotations that are set off from the main text by indentation and blank lines (in the modern fashion) and also lengthy quotations that are set off by the use of other typographic cues, such as a series of quotation marks in the margin, or (if unambiguously marking a block quotation) a change of typeface, or some combination of these. If you're not sure if a block of text is a <Q>, simply record the appearance of the text (using, e.g. <P> and <HI>).
<Q>s are usually the best way to tag even very substantial items embedded in prose, e.g. a poem or a document of some kind quoted within a chapter, or within a note, or within an introduction.
<Q> can if necessary even contain an entire <TEXT>, with its own <FRONT> matter, <BODY>, <DIV> structure, and so on. Use <Q> for such embedded items, rather than trying to treat them as <DIV>s of the main text (unless that's really what they are). Treating them as <DIV>s forces you to treat all the material surrounding them as <DIV>s too, at the same level.
>Prefer this:
<DIV1 TYPE="introduction"> <P>blah blah</P> <P>blah blah</P> <P> <Q>here's a poem</Q> </P> <P>blah blah</P> </DIV1>
to this:
<DIV1 TYPE="introduction"> <DIV2 TYPE="stuff before the poem"> <P>blah blah</P> <P>blah blah</P> </DIV2> <DIV2 TYPE="poem"> <LG><L>here's a poem</L></LG> </DIV2> <DIV2 TYPE="stuff after the poem"> <P>blah blah</P> </DIV2> </DIV1>
Block quotations accompanied by citations should record the quotation within <Q> tags and the citation within <BIBL> tags; the <BIBL> will normally be placed within the associated <Q>.
Most material that is set off from the main body of the text but is adjacent and related to it can be safely tagged as <NOTE>. (But arguments (summaries at the head of <DIV>s), salutations, and speaker names and stage directions in drama are among the note-like features that have their own tags.)
Record each note at the point in the main text to which it relates, set off by appropriate tags, not at the point where it appears on the page.
A note that spills onto the next page needs to be treated as a single note, not two, and should be placed in the text where it applies.
If the note points to a place in the text which is marked with a flag of some kind (e.g. a footnote reference number, an asterisk (*), etc.), discard this marker from the note once it has served its purpose by locating the <NOTE> in the right place in the text. The corresponding character in the text itself should also be removed, but not completely: it should be preserved as the value of the "N" attribute of the <NOTE> tag. Notes that use non-alphabetical symbols such as "daggers," section-marks, paragraph marks, etc., should preserve those characters too if possible, using character entities, like this: <NOTE N="†">. If the character is not recognized as corresponding to a readily available character entity, supply "#" or "$" as the value, using the rules given below for unrecognized symbols.
If the note is keyed to the text by line number, verse number, etc., place the note at the end of the line (etc.) to which it applies, and discard the literal number from the note.
Use the "PLACE" attribute of the <NOTE> tag to indicate where the note appears on the page:
- PLACE="marg" in margin or adjacent to the text (even if part of it runs across the whole page because of lack of room in the margin)
- PLACE="foot" in a footnote, below the text
- PLACE="inter" interlinearly (between the lines of text)
- PLACE="inline" not distinguished from the main text by location.
If there are multiple distinguishable sets of notes in the same location (two sets of footnotes, for example; or multiple sets of marginal notes marked by different kinds of flags, one set marked by numbers, one by letters), distinguish them by appended numbers: PLACE="foot1" and PLACE="foot2" for example.
![]()
Example of book with two sets of marginal notes, one keyed to letters, one to numbers; record them as <NOTE PLACE="marg1"> and <NOTE PLACE="marg2">
Notes that apply to two (or more) distinct loci or lines should be reproduced and inserted at *both* (or all) the relevant points.
These need to be distinguished from notes that apply to a span of loci or lines; notes applying to a span of lines should be placed after the last line in the span with indications of the length of the span (e.g., "14-23" [with reference to line numbers] or "*-*" [with reference to two "*" flags in the text]) retained.
A note that appears next to a single verse line or set of lines and seems to relate to that line (or set) should be placed at the end of the line(s) in question.
A note that relates to a specified group of lines, verses, etc., should be moved into the text at the end of the last item to which it applies. If there are line numbers, the line number indication in the note should be preserved. If physical arrangement, rather than explicit line numbers, serve to specify the line or verse number range, and there are line numbers in the verse, supply the appropriate number range in brackets at the beginning of the note.
Notes referenced to a line (verse, etc.) number followed by "f." ("2365 f." meaning "line 2365 and following") should be treated as notes referenced to a span of two lines (in this case, 2365-66), that is, placed at the end of the second line (2366), with the full line reference preserved in the note: <NOTE PLACE="foot">2365 f.: ... </NOTE>
A note that seems to relate to an entire text division (e.g. a <DIV> or <P>) should be inserted at the beginning of the text that comprises that division, or to end of the <HEAD> if that is more convenient (and if it has one). E.g. a marginal note applying to a paragraph as a whole may be inserted at the beginning of the paragraph. This occurs commonly in books that contain a running summary or set of running headers in the margins: if these are not treated as <HEAD>s, or <ARGUMENT>s, they should be treated as <NOTE>s (PLACE="marg") and inserted at the beginning of the section to which they apply. If the summary is found centered at the head of the text proper (instead of in the margin) it should usually be given a tag of its own and tagged as <ARGUMENT> or <HEAD> (see below under Heads").
A marginal note in a prose text that seems to apply vaguely to the material next to which it is placed should be inserted at the end of the nearest sentence (as marked by punctuation), or at some other break in the text if that seems more appropriate.
In the case of notes that supply bibliographic citations, similarity of wording between note and text may provide a clue as to the best place to insert the note, as in this example:
Democ.Instit. Antonius Demochares saith of him, that he was exiled Christ.relig. in the persecution under Diocletian, and that he returned from banishment after the death of Diocletian and Licinius, and recovered his Bishoprick again, where he continued until the reign of Iulian.<P>Antonius Demochares saith of him,<NOTE PLACE="marg">Democ. Instit. Christ. relig.</NOTE> that he was exiled in the persecution under Diocletian, and that he returned from banishment after the death of Diocletian and Licinius, and recovered his Bishoprick again, where he continued until the reign of Iulian.</P>
Apparatus that relates generally to the material on a page, or for which the appropriate place cannot readily be determined, should be attached to the last line of text at the bottom of the page.
Reference numbers in the text that point to something other than a note (e.g. to some part of an illustration), or for which the target cannot be found, should simply be recorded as part of the text.
Passages of verse (especially 2 or more lines, quoted and arranged as verse) within a note will normally be most readily coded as a quotation (<Q>) containing <L>s or <LG>s, embedded within the <NOTE> element.
Notes comprising a running interlinear commentary or interlinear gloss poses special problems. See below.
In general, prefer to record itemized sequences as <LIST>s rather than <TABLE>s if possible. Use <TABLE> when the material cannot be readily understood without the spatial organization that tables provide.
Numbered sequences of items when the items themselves are blocks of text of considerable size (numbered paragraphs, for example) should not normally be treated as lists.
Complex lists (lists within lists) should be encoded with nested <LIST> tags:
<LIST> <ITEM> .. </ITEM> <ITEM><LIST> <ITEM> .. </ITEM> <ITEM> .. </ITEM> </LIST> </ITEM> </LIST>
Treat any numbers that enumerate items in a list as part of the text of that item; record them neither with separate <LABEL> tags nor as attribute values. E.g.:
<LIST> <HEAD>Sins</HEAD> <ITEM>1. Avarice</ITEM> <ITEM>2. Sloth</ITEM> <ITEM>3. Pride</ITEM> </LIST>
Lists of pairs may be tagged with the element pair <LABEL> and <ITEM> (in that order). If you use this option, you may omit any "leader" (e.g. a dot leader) between the paired items. E.g.:
THE PLAYERS' NAMES The Prince...............Jn. Longfellow The Pauper...............Thomas Goodrich Joan the Tappester........Jack Smithson <LIST> <HEAD>THE PLAYERS' NAMES</HEAD> <LABEL>The Prince</LABEL><ITEM>Jn. Longfellow</ITEM> <LABEL>The Pauper</LABEL><ITEM>Thomas Goodrich</ITEM> <LABEL>Joan the Tappester</LABEL><ITEM>Jack Smithson</ITEM> </LIST>
Tables should be recorded as you would using HTML tables, oriented by row, with the number of columns determined by the number of cells within the row. Use the spatial organization of the text to determine the number of rows and columns (not necessarily reflected in printed border lines). The ROWS and COLS attributes of the <CELL> tag should be used just like the ROWSPAN and COLSPAN attributes of the <TD> in HTML to indicate cells that extend across two or more rows or columns. Cells that contain a heading or label for a row (or column) should receive the attribute ROLE="label".
EEBO dtd | HTML equivalent |
---|---|
<TABLE> | <TABLE> |
<ROW> | <TR> |
<CELL> | <TD> |
<CELL ROLE="label"> | <TH> |
<CELL ROWS=""> | <TD ROWSPAN=""> |
<CELL COLS=""> | <TD COLSPAN=""> |
Particularly complex tables may be recorded (again as in HTML) with nested <TABLE> tags, i.e., a <TABLE> within a <CELL>, or by combinations of <LIST> and <TABLE>, i.e. a <LIST> within a table <CELL> or a <TABLE> within a list <ITEM>.
Physical arrangements that cannot easily be accommodated by our simple table model (e.g., labels with text running vertically) may need to be adapted and adjusted until they fit; it is more important to preserve the relationships between the items in the table than to preserve its exact layout.
Tables that continue from one page to the next may be tagged as one continuous table, with an embedded <PB> tag, especially if its headings are not repeated on the new page. If the headings are repeated, it is usually easier to close the old <TABLE> and open a new one on the new page.
Here is a sample simple table (this one is simple enough that it could almost be done as a <LIST>). [For another example see ]
Recorded as:
<DIV TYPE="table">
<HEAD><HI>TABLE</HI></HEAD>
<HEAD>By this table, shall ye fynde the Epistles and Gospels, for the Son|daies, and other feastiuall dayes.</HEAD>
<P>FOR TO fynde them the sooner, shall ye seke for these capital letters, <HI>A, B, C D,</HI> whi|che stande by the syde of this boke alwaies, On or vnder the letter shall you fynde a crosse ✗, where the Epistle or the Gospell begynneth, and where the end is, there shal ye find an halfe crosse, # And the fyrst lyne in this table is alway the e|pistle, and the seconde lyne is alway the Gospell.</P>
<TABLE>
<ROW><CELL ROLE="label" COLS="3">On the fyrst Sonday in Aduent.</CELL></ROW>
<ROW>
<CELL>Rom. xiii.</CELL>
<CELL>C</CELL>
<CELL>And for as muche as we knowe</CELL>
</ROW>
<ROW>
<CELL>Math. xxi.</CELL>
<CELL>A</CELL>
<CELL>Nowe when they drew nye vnto</CELL>
</ROW>
<ROW><CELL ROLE="label" COLS="3">On the second sonday in the Aduent.</CELL></ROW>
<CELL>Rom. xv.</CELL>
<CELL>A</CELL>
<CELL>what so euer thynges are writen</CELL>
</ROW>
<ROW>
<CELL>Luc. xx.</CELL>
<CELL>C</CELL>
<CELL>And there shall be signes</CELL>
</ROW>
</TABLE>
Headings at the head of text divisions and stanzas (<DIV>s and <LG>s) should be tagged as <HEAD>. Subheadings should be tagged as <HEAD TYPE="sub">.
Some headings have special tags (see below). If heading-like material doesn't fall clearly into one of these special categories, use simple <HEAD>. Incipits ("here begins a tract about sin") are typically recorded as <HEAD>s with TYPE="incipit". Subheadings may be recorded as <HEAD TYPE="sub">.
"Idlenesse is lesse harmefull then vnprofitable occupation."
PUTTENHAM<EPIGRAPH> <Q>"Idlenesse is lesse harmefull then vnprofitable occupation."</Q> <BIBL>PUTTENHAM</BIBL></EPIGRAPH>
Epigraphs are a common place to find bits of non-roman script; record those bits with <GAP DESC="foreign"> as described above, but place the "foreign" portion inside the <EPIGRAPH> tag.
Commentaries and sermons frequently quote a passage of text at the beginning (or at the beginning of each division), then comment on it. Encode these passages as <EPIGRAPH><Q> ... </Q></EPIGRAPH>.
Material at the end of a text division that is set off from the main text is normally to be tagged as a <TRAILER> or <CLOSER>. <TRAILER> is the more general tag, used for material without such internal structures as datelines, salutations, or signatures. Typical <TRAILER>s include "Amen," "Finis," and explicits ("here ends the tract written by Master John Knox."). <CLOSER>, on the other hand, is the counterpart of <OPENER>; it is used when the concluding material includes lengthy or complex information, including datelines, salutations, or signatures, especially in letters. See Letters, below, for examples. Requests for prayer for the author's soul are typically recorded as <CLOSER>s.
Epigraphs and bylines can appear at the foot of a division as well as at its head (see above for a description of epigraphs).
Verse lines. Each verse line should be enclosed in <L> tags. Do not attempt to record the varying indentation of verse lines; pay attention to indentation only insofar as it indicates a stanza break or a "broken" line (see below).
Broken lines. Sometimes when a verse line is too long to fit on the page, its last word or two is placed (sometimes marked off with a bracket or parenthesis) at the end of the next line or at the end of the preceding line (wherever it fits best). Such detached bits of verse lines should be recorded at the end of the line to which they really belong.
Mary had a little lamb, [snow. Its fleece was white as<L>Mary had a little lamb,</L>
<L>Its fleece was white as snow.</L>
Groups of lines (<LG>s).
<DIV1 TYPE="poem">
<L>When the cat's away</L>
<L>The mice will play</L>
</DIV1>
<P>A stitch in time saves nine.</P>
<LG>
<L>When the cat's away</L>
<L>The mice will play</L>
</LG>
<P>Too many cooks spoil the broth</P>
<P>John walked along, chanting constantly:
<Q>
<L>When the cat's away</L>
<L>The mice will play</L>
</Q>
But no one noticed.</P>
<P>John walked along, chanting constantly:
<Q>
<LG>
<L>Red rover Red rover,</L>
<L>Come over Come over</L>
</LG>
<LG>
<L>The bird's on the wing,</L>
<L>The dog's had his fling.</L>
</LG>
</Q>
But no one noticed.</P>
Lines vs. line-groups. It is often unclear when a group of lines has enough organization to be called a stanza (line- group <LG>). If in doubt, err on the side of fewer line-groups rather than more. And be consistent throughout a particular poem, so that a particular structure is not sometimes tagged as a <LG> and sometimes left untagged. Clues to look at include, in decreasing order of significance:
<LG>s vs. <DIV>s. It is not always easy to distinguish between <LG>s and <DIV>s: both can have headings; both can nest to create a structural hierarchy. Metrical units (true stanzas) are always <LG>s; verse paragraphs of irregular length are frequently best recorded as <LG>s, especially if they are not consistently supplied with headings. On the other hand, <DIV>s should be used for line-groups big enough to have true titles, or to appear in tables of contents.
Groups of stanzas within a poem should receive a numbered <DIV> tag. In most cases, you will use only a single level of <LG> (no nesting), and treat it effectively as the lowest-level text division. Any grouping of stanzas is therefore recorded as a <DIV>.
Entire poems. Each poem will usually be recorded as a <DIV> of the appropriate number (<DIV1> etc.), with TYPE="poem". Don't try to distinguish between different kinds of poems, between poems and songs, etc. Any discrete item in verse is TYPE="poem". Poems may, of course be subdivided further into <DIV>s and <LG>s of various types. If a book consists of a single poem, then the <BODY> element constitutes the poem. If a poem is quoted within a prose context, it is usually easiest to treat it as a <Q>. See next.
Poetry mixed with prose. When poetry is truly interspersed with prose, and either the poetry is the predominant form, or there is no clearly predominant form, the prose should be recorded within <P> tags, the verse within <LG> tags. When poetry gives way to prose, close the <LG> and open a <P>; when prose gives way to poetry, close the <P> and open an <LG>, even if the actual prose paragraph, or even the last sentence, is not finished.
Exceptions:
Aside from a few special tags (below), prose drama should be recorded like other prose (in <P>s, etc.) and verse drama like other verse (in <LG>s, <L>s, etc.), including the rules for interspersed poetry and prose.
Cast lists. Cast lists (often headed "dramatis personae") should be recorded like other lists, with the <LIST> tag. Cast lists will commonly appear as separate <DIV>s (within the <FRONT> matter of a book if the book contains one play). For complex cast lists, use nested lists and labels to indicate cast groupings.
Stage directions. Stage directions should be recorded with the <STAGE> element. Stage directions sometimes appear between the columns of a multicolumn text, or in the margin, where they look like notes. In other books, they may be centered (as if they were headings) or indented (as if they were little paragraphs). They are occasionally typographically distinct (it italics; within parentheses; or both).
Speakers. The name (sometimes abbreviated) of the speaker is recorded with <SPEAKER>. In print, these appear at the head of a speech: e.g. typically above the first line of the speech (sometimes centered), in the margin, in an indented line of its own at the head of the speech, or in italics at the beginning of the first line of the speech. Regardless of where it appears in print, the <SPEAKER> tag is tucked into the beginning of the appropriate <SP> ("speech") tag.
Additional text associated with the speaker's name should be included in the <SPEAKER> tag, like this: <SPEAKER>Mr. Jones, reading from letter.</SPEAKER> Multiple names should be enclosed in a single set of <SPEAKER> tags, like this: <SPEAKER>Mr. Jones and Mrs. Smith.</SPEAKER>
Speeches. The basic unit of drama is the SPEECH (<SP>). A speech normally continues uninterrupted as long as the character speaking it is uninterrupted by another speaker or by the end of a division (act, scene, etc.).
"Songs" and other material specially set off within a speech should not normally be given any special tagging; if they have headings, they may need to be recorded as a nested <LG>. In exceptional cases when they contain an elaborate structure they may be recorded as a quotation (<Q>).
Prologues and Epilogues should normally be treated as part of the play, recorded as <SP>s like any other speech, though they may sometimes require a numbered <DIV> of their own.
Acts and Scenes. The act/scene structure should be recorded with appropriately TYPEd and numbered <DIV>s (e.g., <DIV2 TYPE="act" N="3">).
Personal letters that appear as text divisions should be treated as <DIV>s just like any other text division (chapters, sections, etc.). Letters quoted within running text (e.g. a letter quoted within the chapter of a book) have been given a special tag, <LETTER>. Note that dedications frequently look like letters, since they contain salutations and signatures, but they're not: treat them as <DIV TYPE="dedication">. (You may, however, still use <OPENER> <CLOSER&> <SIGNED> <SALUTE> etc. in such letter-like divisions, if they apply.)
Special tags are available to tag the salutations (<SALUTE>), signatures (<SIGNED>), and datelines (<DATELINE>) often found in letters. Use these only if they clearly apply.
Place <SALUTE>, <SIGNED>, and <DATELINE> within <OPENER> if they appear at the head of a letter; place them within <CLOSER> if they appear at the end. See the TEI guidelines for fuller descriptions of these elements. If salutations and signatures are combined or confused in a single opener or closer, use the <OPENER> or <CLOSER> tag alone, without trying to tag the separate constituent parts.
Affaicter. To trim, tricke, decke, dresse curiously, make neat, spruce, fine; to refine; also, to tame, reclaime, breake, make gentle, bring to ciuilitie.
Affaicter vn oiseau. To man a hauke throughly.
Affaicterie: f. A trimming, tricking, decking, neat, quaint, or fine dressing; also, neatnesse, nicenesse, curiositie, quaintnesse; also, a breaking, taming, reclayming, ciuilizing, making gentle; (hence) also, the through manning of a hauke, &c.
...can be recorded like this. The encoding of the phrasal subentry for "Affaicter vn oiseau" with a <DIV2> is probably superfluous in this case (a new paragraph with a <HI> heading would do as well); it is encoded more thoroughly here as an example of what can be done with more complexe entries if necessary.
<DIV1 TYPE="entry"><HEAD>Affaicter.</HEAD> <DIV2> <P>To trim, tricke, decke, dresse curiously, make neat, spruce, fine; to refine; also, to tame, reclaime, breake, make gentle, bring to ciuilitie.</P> </DIV2> <DIV2 TYPE="subentry"> <HEAD>Affaicter vn oiseau.</HEAD> <P>To man a hauke throughly.</P> </DIV2></DIV1> <DIV1 TYPE="entry"> <HEAD>Affaicterie: f.</HEAD> <P>A trimming, tricking, decking, neat, quaint, or fine dressing; also, neatnesse, nicenesse, curiositie, quaintnesse; also, a breaking, taming, reclayming, ciuilizing, making gentle; (hence) also, the through manning of a hauke, &c.</P></DIV1>
Word-for-word interlinear gloss in (?) verse:
<L>Dirae
<NOTE PLACE="inter">fendes of fu|ryes of hell</NOTE>
& opes
<NOTE PLACE="inter">ryches</NOTE>
Charites
<NOTE PLACE="inter">thre goddes of fauour</NOTE>
cheae&abque;
<NOTE PLACE="inter">brachia scor|pionis</NOTE>
facetiae
<NOTE PLACE="inter">vrbanitates</NOTE>
[...] </L>
<L>At&abque; fores
<NOTE PLACE="inter">a payre of gates</NOTE>
furiae
<NOTE PLACE="inter">fendes of hell</NOTE>
Parcae
<NOTE PLACE="inter">thre goddes fatall</NOTE>
Gratiae<NOTE PLACE="inter">thre goddes of fauour</NOTE>
quo&abque; [...] </L>
In general punctuation should be retained, but its spacing somewhat regularized. When a colon, semicolon, comma, question mark, closing quotation mark, or period falls between words, place a space after it, but none before it (unless it is being used to set off a number, like this: .lxvi. or .45. in which case it should be spaced as shown). When an opening quotation mark falls between words, place a space before it, but none after it. When a virgule falls between words, place a space before and after it. In case of doubt, follow the spacing of the original as best you can.
Record the various forms of colon, period, comma, semicolon, and virgule (slanted line) with their modern keyboard equivalents ( : . , / ); a vertical bar should be recorded using the | entity (since we have reserved the keyboard character for another purpose).
Question marks vary considerably in form (some of them looking like inverted semicolons); record them all with the standard "?"
Opening and closing double quotation marks should both be recorded using the ordinary keyboard double-quote character (" = HEX 22), not the “ and ” entities.
Opening and closing single quotation marks, as well as apostrophes, should be recorded with the same character, the ordinary keyboard single-quote character (' = HEX 27)
Hyphens (not dashes) should normally be recorded using the ordinary hyphen character.
Hyphens at the end of a line should be recorded as the ordinary keyboard "pipe" (vertical bar) character, unless they appear between numerals, when they should be recorded with the ordinary hyphen. Be aware that hyphens in many texts may appear as an angled stroke, not a horizontal one, and may also commonly appear doubled, resembling an equals sign (=), either horizontally or at an angle, like this:
If there is no end-of-line hyphen, but you think that there should have been (i.e., that a single word has been broken across two lines), place a plus sign, instead of a space, between the two halves: "cro+wn" "pri+nce"
Dashes should be recorded using the entity —, regardless of where they appear.
Ellipses, whether two characters or many--strings of dots or asterisks indicating omitted or missing text--should be recorded as ordinary text, not the … entity, using periods or asterisks as appropriate: . . . . . * * * * * . .
Some editions of prose mark extended quotations by placing quotation marks at the beginning of every quoted line. The same technique is used in other books to mark proverbs and other sententious remarks. E.g.,
he made reasons...seyenge: God made alle thynges " by reason, and governethe thynges " made by reason; the sterres be movede by reason; and so " oure naturalle lyfe excedynge from reason by slawthe and " ignoraunce awe to be reducede by lawes and reasons. " Wherefore thau3he there be somme thynges in the rule of " seynte Benedicte, the intellect of whom the dullenesse of my " mynde may not comprehende, y suppose hit be beste to 3iffe " credence to auctorite. Wherefore also he persuadeth hymselfe ... O no (said Cecropia) company confirmes reso- " lutions, & lonelines breeds a werines of ones thoughts, " and so a sooner consenting to reasonable profers.
If this is really a block quotation, and you can identify the beginning and end of it, go ahead and place the whole block of text marked by the marks inside <Q> tags. Bear in mind that the marginal quotation marks can take unusual forms (sometimes they look like a pair of commas), and that it is not always easy to discover where the quotation actually begins and ends. The upper example above could be encoded as:
<P> ... he made reasons...seyenge:
<Q>God made alle thynges by reason, and governethe thynges made by reason; the sterres be movede &startq; by reason; and so oure naturalle lyfe excedynge from reason by slawthe and ignoraunce awe to be reducede by lawes and reasons. Wherefore thau3he there be somme thynges in the rule of seynte Benedicte, the intellect of whom the dullenesse of my mynde may not comprehende, y suppose hit be beste to 3iffe &endq; credence to auctorite.</Q>
Wherefore also he persuadeth hymselfe ... </P>
Whether or not you can identify the quotation well enough to tag it as <Q>, record the first and last of the marginal quotation marks with the special entities &startq; (first mark) and &endq; (last mark). If there is only one such marginal quotation mark (as sometimes happens with short quotations or proverbs), use both entities in sequence (&startq;&endq;).
Braces and brackets that group multiple lines should be ignored if all they do is group portions of ordinary running text, such as poetry. But if they are used to link one piece of text to another, such as frequently in tables and lists, their meaning needs to be interpreted. Sometimes this will require entering text more than once, e.g. if the brace means "this word applies to all these other words," the easiest technique may be simply to apply the word to all of the other words by entering it as many times; sometimes it may require treating the single item as a head or label for a list containing the grouped items; sometimes it may involve attaching a ROWS or COLS attribute to a table <CELL>. Many variations are possible, which the following examples can only suggest.
chapter | ![]() | 1 How to build a kite | |||||
2 When to fly a kite | |||||||
3 Famous kite flyers of our time | |||||||
4 When not to fly a kite | |||||||
5 "I've flown it: now what?" | |||||||
(Brace used like "ditto" mark to associate one word repeatedly with a series of items; may be recorded as follows, by repeating the word:) | |||||||
<LIST> <LABEL>chapter 1</LABEL> <ITEM>How to build a kite</ITEM> <LABEL>chapter 2</LABEL> <ITEM>When to fly a kite</ITEM> <LABEL>chapter 3</LABEL> <ITEM>Famous kite flyers of our time</ITEM> <LABEL>chapter 4</LABEL> <ITEM>When not to fly a kite</ITEM> <LABEL>chapter 5</LABEL> <ITEM>"I've flown it: now what?"</ITEM> </LIST> | |||||||
Dramatis Personae | |||||||
---|---|---|---|---|---|---|---|
townspeople | ![]() | Joe | |||||
Mary | |||||||
Bothom | |||||||
Josephus | |||||||
Joan, a noblewoman | |||||||
John, a philosopher | |||||||
(Brace used to associate one item as a head of a set of other items; may be recorded as follows, placing the one item in <HEAD< tag and the list of items in <LIST> and <ITEM> tags:) | |||||||
<LIST> <HEAD>Dramatis Personae</HEAD> <LABEL>townspeople</LABEL> <ITEM> <LIST> <ITEM>Joe</ITEM> <ITEM>Mary</ITEM> <ITEM>Bothom</ITEM> <ITEM>Josephus</ITEM> </LIST> </ITEM> <ITEM>Joan, a COuntess</ITEM> <ITEM>John, a philosopher</ITEM> </LIST> | |||||||
| |||||||
(Brace used in a table to place one cell in conjunction with a set of other cells; may be recorded using the COLS or ROWS attribute of the <CELL> tag:) | |||||||
<TABLE> <ROW> <CELL>In apice trianguli.</CELL> <CELL ROWS="3">Triangulus.</CELL> </ROW> <ROW> <CELL>In basi praecedens 3.</CELL> </ROW> <ROW> <CELL>Sequens & vltima. 3.</CELL> </ROW> </TABLE> |
Basic letter forms. Most letters encountered will belong to the modern alphabet, though their appearance may be strange.
Ligatures. Ligatured characters (ae, oe, ct, st, sp, fi, ff, ss, etc.) should be recorded as two separate characters. Ignore the ligature. Be aware that the italic "ae" ligature usually has no upper bow to the "a" and is easily mistaken for "oe". Be aware also that italic fonts especially tend to have ligatures between many more pairs of letters than we are accustomed to.
Ampersands, whether shaped like & or like "7," should be recorded as &.
Some examples of ampersands (&)
Letters printed upside-down (a common printer's error) should be recorded as if turned right side up.
Recognizable letters with diacritics
Some examples of 'macrons' (~) | |||||
---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Some "general" abbreviation diacritics | |||
---|---|---|---|
![]() | <ABBR>Cantuar</ABBR>, | ![]() | <ABBR>clico</ABBR>, |
![]() | <ABBR>Cantuar</ABBR>, | ![]() | <ABBR>clico</ABBR>, |
![]() | <ABBR>Suff</ABBR>, | ![]() | <ABBR>qd</ABBR>, |
![]() | <ABBR>Marchionis</ABBR> | ![]() | <ABBR>Ric</ABBR> |
![]() | <ABBR>Alred</ABBR> | ![]() | <ABBR>vl</ABBR> |
![]() | <ABBR>red</ABBR> | ![]() | <ABBR>apd</ABBR> |
Some common abbreviations by superscript | |
---|---|
"thou" (y^u) | ![]() |
"that" (y^t) | ![]() ![]() ![]() ![]() |
"the" (y^e) | ![]() ![]() |
"with" (w^t) | ![]() |
Symbol | Record as: | Meaning | Examples: | conditions: |
---|---|---|---|---|
![]() | &abper; | per, par | ![]() ![]() ![]() | |
![]() | &abpro; | pro | ![]() ![]() | |
![]() | &abus; | -us | ![]() ![]() ![]() | at the end of a word only |
![]() | &abque; | -que | ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() | at the end of a word only |
![]() | &absed; | sed | ![]() | only when forming a word by itself |
![]() | &abser; | ser | .. | |
![]() | &abcon; | con- cum- | ![]() | at the beginning of a word only |
![]() | &abrum; | -rum | ![]() | at the end of a word only |
![]() | &abis; | -is | ![]() ![]() ![]() ![]() ![]() | at the end of a word only |
Letters from other alphabets, e.g. Hebrew and Greek, when used singly (as opposed to in whole words or extended text) should be recorded with ISO standard character entities.
Other symbols include alchemical and astrological symbols, which will rarely if ever appear as part of words, but may appear in or as marginal notes, in designations of units of measure, in calendrical tables, etc.
Symbol | Example | Meaning | Record as |
---|---|---|---|
Zodiacal signs | |||
![]() | ![]() | Aries | &Aries; |
![]() | ![]() | Taurus | &Taurus; |
![]() | ![]() | Gemini | &Gemini; |
![]() | Cancer | &Cancer; | |
![]() | Leo | &Leo; | |
![]() | Virgo | &Virgo; | |
![]() | Libra | &Libra; | |
![]() | Scorpio | &Scorp; | |
![]() | ![]() | Sagittarius | &Sagitt; |
![]() | ![]() | Capricorn | &Capri; |
![]() | ![]() | Aquarius | &Aquar; |
![]() | ![]() | Pisces | &Pisces; |
Planetary signs | |||
![]() | Sun | &Sun; | |
![]() ![]() | ![]() | Moon | &Moon; |
![]() | ![]() | Mercury | &Merc; |
![]() | ![]() | Venus | &Venus; |
![]() | Earth | &Earth; | |
![]() | ![]() | Mars | &Mars; |
![]() | ![]() | Jupiter | &Jupit; |
![]() | ![]() | Saturn | &Saturn; |
Other signs | |||
cross | ✗ | ||
![]() | ![]() ![]() | capitulum (paragraph) | ¶ |
Dubious characters.Individual characters that cannot be readily identified as one thing or another ("is this a funny-looking "q" or some kind of symbol?" "Is this a "c" or a "t"?) should be recorded as "$". However, do not overuse this expedient: if the same symbol recurs repeatedly in a book, ask us for help in identifying it; do not simply record dozens or hundreds of examples of the same symbol with "$".
"Excessive" abbreviation. If sampling shows that more than one word in every ten in a given text contains an abbreviation symbol, a dubious mark ($), a peculiar symbol mark (#), or an <ABBR> tag, the work should be rejected for conversion.
Illegible text (text that is blurred, blotted, bled-through, or otherwise hard or impossible to read) should be surrounded with the tag <UNCLEAR>. If the text appears to have been deliberately erased or crossed-out, specify deletion as the cause with <UNCLEAR CAUSE="del">. Text within the <UNCLEAR> tags should be recorded as usual. E.g., characters, diacritics, superscripts, and symbols should be recorded as above as far as possible; characters that can't be identified with any confidence should be recorded as "$"; characters that are more or less completely gone or too damaged to read as "@";, peculiar symbols as "#"; and so forth. Discernible features, too, within an UNCLEAR span should be recorded as usual.
![]()
transcribed as:
as <UNCLEAR>$p@$</UNCLEAR> hostility
![]()
transcribed as:
one <UNCLEAR>accord @@@</UNCLEAR>
The following samples are far from a definitive list of letter forms, but are meant only to provide some help recognizing the most common letters in the most common typefaces. Many books will have to be considered individually, the form(s) of each letter ascertained by its presence in a recognized word or unambiguous context so as to create, in effect, an alphabet or set of alphabets for that book. The samples below are arranged under headings that describe the most common families of type: roman, italic, textura, rotunda, and bastarda. There are many variants of each of these (except rotunda, which is fairly uniform) which may differ very considerably from the examples given here. And individual misprinted and ill-aligned letters may present a very anomalous appearance.
Record as: | Textura | Italic | Bastarda | Rotunda |
---|---|---|---|---|
a | ![]() | ![]() | ![]() | ![]() |
b | ![]() | ![]() | ![]() ![]() | ![]() |
c | ![]() | ![]() | ![]() | ![]() |
d | ![]() ![]() | ![]() | ![]() ![]() | ![]() |
e | ![]() | ![]() | ![]() | ![]() |
f | ![]() | ![]() | ![]() | ![]() |
g | ![]() | ![]() ![]() | ![]() | ![]() |
h | ![]() ![]() | ![]() ![]() | ![]() | ![]() ![]() |
i | ![]() | ![]() | ![]() | ![]() |
j | ![]() | |||
k | ![]() ![]() | ![]() | ![]() | |
l | ![]() ![]() | ![]() ![]() | ![]() ![]() | ![]() |
m | ![]() ![]() | ![]() | ![]() | ![]() |
n | ![]() | ![]() | ![]() ![]() | ![]() |
o | ![]() | ![]() | ![]() | ![]() |
p | ![]() | ![]() | ![]() | ![]() |
q | ![]() | ![]() | ||
r | ![]() ![]() ![]() | ![]() | ![]() ![]() ![]() | ![]() ![]() |
s | ![]() ![]() | ![]() ![]() | ![]() ![]() ![]() | ![]() ![]() |
t | ![]() | ![]() | ![]() | ![]() |
u | ![]() | ![]() | ![]() | ![]() |
v | ![]() ![]() | ![]() ![]() | ![]() | ![]() |
w | ![]() | ![]() ![]() | ![]() ![]() | ![]() |
x | ![]() | ![]() | ![]() | |
y | ![]() ![]() | ![]() | ![]() | ![]() |
z | ![]() | ![]() |