This document lists questions received from data conversion firms with regard to their prospective participation in the project, together with the answers. It also lists updates to the keying guidelines and other announcements, in order that all information supplied to any firm is available to all.
ADDENDUM: See also Question 26 below.
ADDENDUM: Some links have now been added, and more will be. 10/26/00.
(2) There is no fixed protocol yet for assessing character accuracy. In the past we have been reasonable (even generous), refusing to regard as errors mistakes that could not reasonably have been avoided without substantial interpretation of the text or context. Establishing such a protocol will be one of the first tasks once we begin production and have hard data to base it on.
On the larger question, the database contains virtually every early English printed book; therefore every possible type of book is to be found in it. The selection principle (as described in the answer to another question; see the Question Log) currently in place will probably mean that "literary" texts, including drama and poetry, will be somewhat overrepresented, but there will certainly be sermons, plays, reference books, philosophic and scientific works, laws, chronicles, practical books of every sort, as, of course, as Bibles, tracts, treatises, and commentaries. Even among the sample pages you will find an herbal, a poetic manual, several teaching manuals and grammars, a prose romance, poetry, rhetoric, a collection of proverbs, a work on linguistics, an allegory, ecclesiastical history, a Biblical commentary, sermons, and several tracts and treatises.
We expect to continue to use this method, unless it proves impractical. But no text will be rejected on the basis of anything less than 5% of its pages. See also the comments made in answer to one of the questions in the Question Log, regarding the criteria for assessing accuracy.
As for the declaration of the LANG value as CDATA, replacing CDATA with something else (a list of possible values or #IDREFS) would not help matters any, since under those circumstances you could still insert the wrong value (e.g., put "lat" instead of "eng"). So I am not sure why you bring it up in this connection.
This question also seems to overstate the importance in the instructions of attribute values; to ignore several instructions that specify values for attributes even when the specifications are not embodied in the dtd; and to ignore the various instructions that require the vendor to supply no value when there is no obvious value to supply.
I assume that by "alphabets" you mean type faces; but if you are referring to the distinction between the Latin alphabet and (say) the Greek or Hebrew alphabets, this distinction is usually quite marked in the books I have seen.