MODERN LANGUAGE ASSOCIATION OF AMERICA
COMMITTEE ON SCHOLARLY EDITIONS
Preliminary Guidelines for Electronic Scholarly
Editions (June 2002)
NOTE: This preliminary version of the Guidelines for
electronic scholarly editions has been superceded by the Guidelines
approved by the Committee for both print and electronic editions:
These guidelines are intended to help scholarly editors, publishers,
CSE consultants, and CSE reviewers in carrying out their respective
functions; they reflect the principles articulated in the MLA brochure
"Aims and Services of the Committee on Scholarly Editions,"
parts of which are quoted below. [NOTE: Copies of the CSE brochure may be
ordered from the Committee on Scholarly Editions, Modern Language
Association of America, 10 Astor Pl., New York, NY 10003-6981. An
electronic version may be found at www.mla.org/cse/0000] The guidelines
for electronic scholarly editions are closely based on the guidelines for
printed editions. Their goal is to enhance the usability and reliability
of scholarly editions by making full use of the capabilities of the
computer. At this stage, the guidelines are phrased in terms of desiderata
rather than requirements, since hardware and software capabilities are
changing so rapidly; and some desirable features are not yet technically
or economically feasible. Because of this, the CSE encourages the greatest
flexibility in carrying out the technical suggestions set forth below.
While the CSE has over thirty years of experience with printed scholarly
editions -- and the scholarly world at large has several centuries --, few
useful models exist for electronic editions. Therefore experimentation and
a variety of approaches are to be encouraged.
What cannot be compromised, however, is the scholarly quality of the
edition. Exactly the same standards of accuracy, thoroughness, and detail
must obtain for an electronic scholarly edition as for a printed one. In
both, the reliability of the text is paramount.
The CSE does not prescribe a particular method of editing; the
committee's position is that different approaches are appropriate in
different situations. The CSE emphasizes that editors who are thoroughly
acquainted with editorial options applicable to their materials and with
the relevant documentary texts and who are sensitive to the circumstances
attending the composition and production of all forms of the text are in a
position to choose editorial procedures appropriate to their materials,
carry out those procedures accurately and consistently, and explain
exactly what they have done and why.
Standards for the "Approved Edition"and "Approved
Text" Emblems (based on the 1991 CSE Statement of "Aims and
The editorial standards that form the criteria for the award of the CSE
"Approved Edition" emblem can be stated here in only the most
general terms, as the range of editorial work that comes within the
committee's purview makes it impossible to set forth a detailed,
step-by-step editorial procedure. Whatever specific editorial theory and
procedures may be used, the editor's basic task is to establish a reliable
text. In an electronic edition, the provision of basic transcriptions and
tools that allow alternative views of the text and that permit others to
build upon existing editorial work is almost as important. Many, indeed
most, scholarly editions include a general introduction -- either
historical or interpretive -- as well as explanatory annotations to
various words, passages, events, and historical figures. Although neither
is essential to the editor's primary responsibility of establishing a
text, both can add to the value, that is, the usefulness, of the edition.
Whatever additional materials are included, however, the CSE considers the
following essential for a scholarly edition:
- A textual essay, which sets forth the history of the text and its
physical forms, describes or reports the authoritative or significant
texts, explains how the text of the edition has been constructed or
represented, gives the rationale for all decisions affecting its
construction or representation, and discusses the verbal composition of
the text as well as its punctuation, capitalization, and spelling. When it
becomes technically feasible to do so, textual examples used in the essay
might take the form of hypertext linkages to the edition itself
rather than copies of the relevant passages. While it might not be
possible to carry this practice through consistently for all texts cited
(sources, analogues, translations, secondary bibliography, etc.), in
principle it is highly desirable in order to avoid insofar as possible
misquotations of those texts.
- An appropriate editorial apparatus and/or notes, or functional
equivalent thereof, which (1) record authorial alterations and editorial
emendations of the basic text(s) (e.g., a full-text transcription of the
basic text(s) keyed to the edited text will make plain the alterations and
emendations in the latter), (2) discuss problematical readings (if not
treated in the textual essay), and (3) report variant substantive readings
from all versions of the text that might carry authority (thus full-text
transcriptions of all versions of the text that might carry authority
obviate the need to report variant readings). These three kinds of
information need not be presented in any specific arrangement, and not all
obtain in every situation, but the CSE requires that, when applicable,
they should appear either in each edition bearing the "Approved
Edition" emblem or be otherwise available at the time of
If the apparatus is replaced by full-text transcriptions, mechanisms
are needed to display passages selected in parallel and to create collated
lists of textual variants in various categories (e.g., substantive,
- A proofreading plan that provides for meticulous proofreading at
every stage of production so that the accuracy of the text, textual essay,
and textual apparatus is not compromised. Automated proof-reading programs
("spell checkers"), word lists, and computerized collation or
file comparing programs can be used to alleviate the burden, but they
cannot substitute for manual proof-reading nor should they ever be allowed
to make unverified changes in the text.
In addition to the textual essay, editorial apparatus, and
proof-reading plan, equally applicable to printed and electronic editions,
the following requirements also obtains for any electronic edition seeking
- It must employ non-proprietary encoding standards.
- It must be self-describing.
- It must include retrieval software.
Each of these requirements is spelled out in more detail below.
The guidelines below suggest some considerations that the CSE regards
as fundamental to the preparation and publication of useful, reliable
scholarly editions. They cover the kinds of inquiries that an editor,
reviewer, publisher, or informed critic needs to make in order to form a
judgment about the accuracy and completeness of a scholarly edition, and
they can therefore serve as a working checklist of matters that may demand
attention in producing scholarly editions.
Just as no list of general guidelines can anticipate all of the special
problems in a particular edition, so also many of the points mentioned
below will not be applicable to every edition -- e.g., Section IV.C
"Collations" would not be relevant to a diplomatic edition of a
single text. The guidelines are intended only to provide a broad framework
for identifying issues and for dealing with them reasonably.
For an electronic scholarly edition, perhaps the single most crucial
decision is the choice of encoding standard. Internationally accepted and
publicly defined norms, as set forth below, are preferable to proprietary
systems. If the norms are chosen correctly, the edition can be migrated
easily to new hardware and software platforms, thus preserving the work
that has gone into it.
- Of paramount concern is the necessity of standardizing the character
set, encoding norms, and documentation of the source documents and the
electronic edition itself. These elements should be as
machine- and software independent as possible and of such sufficiently
wide-spread use that they can reasonably be expected to be ported into
future systems without too much difficulty; since a well-prepared
electronic edition will in all likelihood outlast the hardware and
software environment in which it was produced. Editors must distinguish
between the intellectual requirements of the edition and the requirements
of its preparation, distribution, and use.
- Character set. For maximum portability the recommended character sets
are ANSI standard X3.4-1986 (lower ASCII), with 128 characters, ISO 646
(82 characters), or UNICODE. In certain disciplines other coding schemes
of long standing exist and may be used (e.g., beta coding for classical
Greek). In some case unique codes using these character sets may need to
be devised in order to represent special characters. The character set
should be explicitly declared as part of the edition itself (e.g., as Text
Encoding Initiative [TEI] Writing System Declarations or as SGML
- Encoding norms. It is preferable to use the implementation of
Standard Generalized Markup Language (SGML) specifically devised for
coding electronic texts, the Text Encoding Initiative (TEI). The choice of
an alternate standard should be fully justified and explained.
- The text itself should be essentially self-describing, which means
that the computer file which embodies it should contain a header with
essential "metadata." The Guidelines for Electronic Text
Encoding and Interchange (TEI P3), edited by C.M. Sperberg-McQueen and
Lou Burnard (1994) offer detailed descriptions of the sorts of information
that should be provided for the source document as well as the electronic
text itself (see chap. 5 of the TEI guidelines). Metadata should include:
- A description of the file itself and the sources used in its
preparation (although the description given here need not be as detailed
as that found in the introductory essay) (File Description).
- The encoding system used (Encoding Description).
- The level of encoding should respond to the purpose of the edition.
However, at a minimum any edition should encode elements which by any
reasonable standard are of general importance and objectively determinable
(e.g., the text structure itself -- chapters, acts, scenes). Any encoding
scheme should be extensible in order to allow the later encoding of
- Contextual information concerning the subject matter of the text as
well as the basic information about the editor(s) (Profile
- Information concerning the changes made to the file in the course
of its preparation (Revision Description).
- Coupled with this is the necessity of a mechanism to authenticate
the contents of the file (e.g., a hashing algorithm using a time-stamping
mechanism to generate a unique id number). Because of the ease with which
electronic texts can be changed, users must be able to satisfy themselves
that the file in fact is what it purports to be.
- Similarly, formats for other media included in the edition (sound,
image, video) should conform to non-proprietary standards.
- While the format and content of electronic editions can,
appropriately, vary as much as those of print editions, it seems clear
that the possibility of digitized facsimiles of the original source
materials, especially the copy text, would enhance the usability and
reliability of virtually any electronic edition. Notionally, one can
conceive of the utility of a hypermedia archive, comprising digitized
facsimiles of all textual witnesses, encoded electronic transcriptions of
each witness linked to it, and a critical text linked to those
transcriptions, along with annotations, sources, analogues, etc. In
practice, the cost of preparing such archives for long texts with many
witnesses is likely to be prohibitive.Appropriate non-textual materials
(e.g., illustrations, recordings of poetry read by the author or
performances of dramatic works) can only enhance the scholarly value of
the edition. In some cases, non-textual materials form an integral part of
the edition. They should be treated with as much care and attention as the
- Annotation of digitized facsimiles as well as linking of image to
transcription at the line or word level would greatly facilitate scholarly
use of such materials.
- Similarly, alignment of parallel texts (witnesses to a single text,
translations) at least to the line level would also facilitate scholarly
use. Line breaks in base transcriptions should be retained so that they
may be shown (if desired) when the text is displayed in different-sized
- Archival format: The "preservation form" of the text
should be non-proprietary and as machine- and software- independent as
possible (e.g., TEI conformant).
- The master digital archive should be maintained on a server,
preferably network-accessible and ideally in the custody of an institution
that can guarantee preservation of the archive and migration to suitable
hardware and software platforms as technology changes (e.g., a university
library or electronic text archive).
- A read-only version of the preservation form of the text should
also be maintained (e.g., on a CD-ROM disk, digital linear tape, or other
long-term storage medium).
- Delivery software involves both presentational and analytical
software. Given the current existence of three widespread software
platforms (MS-DOS/Windows, Macintosh, UNIX) and distribution on removable
media (e.g., diskette, CD-ROM) or the Internet, it seems likely that most
electronic editions will not be universally available to all users in
their most sophisticated form.
- Presentational and analytical software should ideally be widely
available (commercial, shareware, or public domain) for a variety of
platforms and should have a reasonable life expectancy. Although
electronic editions need not be published commercially, they should be
made available in standardized formats, e.g., CD-ROM disks in ISO 9660
format or DVD, preferably not limited to a given computer platform. CD-ROM
disks have the great advantage of fixing the form of the text at a given
time, much like a traditional paper edition; but they do not allow for
additions and corrections except through the release of a second
- Network access from a central location, or text archive, although
not essential, is highly desirable, both to minimize the proliferation of
variant texts and to facilitate revisions. Network access may obviate the
necessity of providing platform-specific versions, since Internet browsing
tools exist already for each platform. The current (1997) HTML markup
language is not adequate for serious scholarly purposes, since it
is concerned with formatting, not the encoding of a text's logical structure;
versions of SGML-marked-up text may be suitable delivery mechanisms.
When later standards (e.g., XML) approach the capabilities of SGML,
they may be considered as acceptable alternatives.
- Such momentary limitations can be overcome, however, by preserving
the text in a more sophisticated archival form (e.g., SGML) and then
converting it into other formats for presentation.
- Hypertext capabilities. The software chosen should allow for the
use of hypertext, preferably with the capability to allow the user to add
personal links as well as to annotate the text locally.
- The editorial principles should include a rationale of the kinds of
hypertext links (two-way, one-way) used as well as of the
categories of information that they
are used to connect (e.g., sources, textual parallels, textual notes). The
links themselves should include information
to indicate their scholarly purpose and
to facilitate searching by category (e.g., source).
- Analytic software similarly should be widely available and not
limited to a single platform.
- Analytic software might include:
- Retrieval software (e.g., TACT). Retrieval software frequently uses
an indexed data base. Such a data base should include every individual
word form as well as (preferably) access to lemmatized forms. The latter
is particularly necessary for old spelling editions. Texts should also be
available in a non-indexed form as well.
- Collation software (e.g., CASE, COLLATE, UNITE). If the editor has
constructed a critical text on the basis of full text transcriptions,
collation software allows the user to verify the editor's critical
practice as well as vary the editorial assumptions (e.g., by selecting
another version as a base text) and criteria (e.g., preservation of
accidentals). Moreover, collation software allows the user to prepare a
subedition of an individual family in a complex textual tradition, thereby
facilitating reception studies.
- Insofar as possible, software should be used instead of manual
techniques. Thus, instead of encoding, for example, morphological
information at the word level, or lemmatizing texts manually, parsers,
lemmatizers, or machine-readable dictionaries external to the text could
be employed. Software of this sort is not yet widely available and, when
it is, may not necessarily fulfil an edition's requirements for accuracy.
It is likely that the development of sophisticated software tools will be
the single most important factor in facilitating the creation of
sophisticated electronic scholarly editions. Any such software should have
the capability of specifying and storing rules for any actions it carries
out and following them without exception.
- CONCEPTION AND PLAN OF EDITION. The content of an electronic
edition differs little from that of a print edition. It should be
appropriate, complete, and coherently conceived. The criteria for what is
to be included in an electronic critical edition will generally be more
expansive than those for a comparable printed edition, because of the
computer's inherent ability to organize and manipulate large amounts of
data. In addition to materials that form part of the edition itself, an
electronic edition can also make use of existing electronic materials by
linking to them. The considerations set forth above with regard to
encoding schemes, formats, digitized facsimiles, etc., apply equally to
all of the materials listed below. The contents should:
- include logically selected, manageable textual content -- e.g., an
edition of a single work, a group of works generically or chronologically
- include, when appropriate, authorial documents in addition to basic
text(s), such as adaptations, working notes, contracts, tables of
contents, prefaces, abstracts;
- present appropriate second-party textual materials -- e.g., letters
from respondents may be desirable in an edition of letters;
- include the editorial materials required by the kind of edition
envisaged -- e.g.,  prefaces and acknowledgments;  lists of sigla,
symbols, and abbreviations;  textual essay;  textual apparatus (or
the functional equivalent, e.g., hypertext links) and/or notes; 
historical/interpretive essay(s);  illustrations or charts, diagrams,
maps;  historical/explanatory notes;  appendices;  bibliography;
 glossary;  index(es);
- be logically arranged and easy to use;
- include appropriate analytical and text retrieval tools, either as
part of the edition itself or as part of the access package for which the
edition is designed (e.g., network browsers).
- EDITORIAL METHODS AND PROCEDURES
- A thorough census of all relevant materials should be conducted.
- Although editors may use reproductions (e.g., photocopies,
microfilms, or digitized facsimiles) for preliminary editing, they should
at some point verify the accuracy of their work against the original
- Machine-readable transcriptions should be made according to an
established rationale and policy, covering, e.g., such matters as
expansion of abbreviations, use of special characters, and indication of
medium. Except for exceptionally clear machine-printed modern texts,
photocopied or original, scanners and OCR software
have not as yet proved accurate enough
to replace manual transcription.
- One very reliable method of manual transcription for printed
materials is to input the same text twice, by two different people, who do
not necessarily have to know the language involved, then use a
collation or file compare program to find the differences.
- Transcriptions should be double-checked and perfected by persons
other than the transcriber, using appropriate manual and computerized
- All significant or potentially significant forms of the text(s)
should be collated or included as machine-readable transcriptions of the
- Accuracy of the collations should be verified by comparison of
results obtained by different people using appropriate collation or file
comparison software to supplement manual proofing. In the latter case, it
may be assumed that the collations obtained through the use of that
software will reflect faithfully the underlying transcriptions.
- Editorial policy for defining and recording variants should be
clearly stated, preferably in the form of parameters established in the
collation software. All items defined as variants should be recorded
whether or not they are to be included in the completed edition. Such
variants will be recorded automatically if complete transcriptions of the
textual witnesses have been made and if the collation software has been
programmed to list them.
- The collation software used should be capable of filtering out
variants according to established categories (e.g., spelling,
capitalization, punctuation) and of separating or grouping the resulting
apparatus by those categories.
- Sources of references and quotations in the authorial text(s)
should be identified, and any textual problems raised should be
- Care should be taken that the text is accurately quoted in the
textual essay, textual notes, historical essay, and explanatory notes,
preferably by hypertext linking to the quoted passage rather than by
copying it, when it becomes technically feasible to do so; so that any
change in the text is reflected in the essay.
- Proofing at every stage to safeguard accuracy is of the highest
- The editors should give serious thought to preserving and making
available the record of their editorial deliberations and the rationale
for editorial decisions.
- PARTS OF THE EDITION
- The decision to use a single or multiple base- or copy-text,
parallel texts, sequential versions, or a combination of these, should be
appropriate to the goal of the edition. Sophisticated encoding and linkage
will allow the greatest flexibility to both editor and user in deciding
and altering the presentation format.
- The form of presentation of the texts -- whether in clear text,
diplomatic transcription, facsimile, or in some other format -- should be
consistent with announced principles. Detailed encoding combined with
appropriate filtering mechanisms can allow the same base text to be
presented in a variety of different ways; e.g., as an old spelling or a
- Inclusive text should use a clear and efficient system to symbolize
or reproduce cancellations, interlineations, omissions, insertions,
- Textual Essay
- The essay should provide a clear, convincing, and thorough
statement of the edition's theoretical principles and practical
methodology, covering such matters as:
- theory of copy-text adopted;
- description of alternative candidates, if any, for basic text
(whether single, parallel, or sequential texts are presented) and
justification of selection; instructions on how to use software to select
alternative base texts;
- justification of form of presentation, whether clear text,
diplomatic transcription, or other form, and instructions on how to
convert the presentation of the text from one form to another;
- clear explanation of the policy of editorial emendation, covering
all changes made in the basic text(s) or documents, whether or not such
changes appear in the emendations list;
- rationale for including and excluding various classes of textual
variants in the apparatus, or instructions on how to use the collation
software to change the paradigms which select variants;
- explanation of treatment of ambiguously broken line-end compounds
or possible compounds in source text(s);
- clear instructions for using the textual apparatus, or the
accompanying collation programs;
- description of the character set and encoding scheme used;
- instructions for use of the text retrieval software.
- The discussion of the materials upon which the edition is based
should include the following, where appropriate:
- a survey of all forms of the text(s) relevant to the edition,
including an account of the provenance of such forms and/or artifacts;
- a record of locations of relevant manuscripts and unique printed
- identification of the specific copies used for collations,
preparation of printer's copy, etc.;
- bibliographical or codicological description of the relevant
artifacts (printed copies, manuscripts, typescripts, tear-sheets, etc.).
When possible this should be accompanied by complete digitized facsimiles
of such artifacts.
- The account of the evolution of the text(s) should include:
- the history of composition and revision, whether by the author,
scribes, editors, compositors, etc.;
- the history of publication of printed texts;
- for scribal texts, a profile of the copying habits, orthography,
and dialect of manuscript scribes.
- Critical/textual apparatus (The term "apparatus" is used here in
its broadest sense. The CSE does not require a standard format for the
- Design and Purpose of Apparatus
- The apparatus or collation software used in conjunction with the
textual essay should enable thorough study of the composition and
transmission of the text within the limits envisaged by the edition.
- The apparatus or collation software should distinguish, where
possible, between what the author has done to the text and what was done
by scribes, printers, compositors, advisors, and editors (including the
- The record of textual variants should be logical, complete, and
uncluttered; it should:
- conform to the principles announced in the textual essay;
- include variants from all authoritative or significant texts;
- make possible, when used in conjunction with the edited text(s),
the recovery of all significant forms of the text, if such is consistent
with the goals of the edition, preferably by display of the complete form
of the transcription of the originals.
- Each part of the apparatus should be self-contained;
cross-referencing of information between lists should be clear and simple
to follow, a process that can be facilitated by appropriate use of
hypertext links. Hypertext links should be coded to make clear
the distinction between textual and non-textual material.
- Encoding of apparatus where there is not a complete transcription of
all relevant witnesses should follow the TEI or other appropriate
- Parts of the Apparatus
- Record of emendations: editorial emendations -- words, spelling,
punctuation, and capitalization -- of the basic text(s) should be reported
or adequately described in a manner consistent with the stated policy of
emendation; if emendations are not individually reported, the policy must
be justified and the classes of unreported emendations adequately
- Record of alterations: the author's alterations of the text should
- Records of variants should follow the edition's stated principles
of inclusion and exclusion and should make clear the history and/or
permutations of the text. Collation software should allow the user to
modify those principles to suit his or her own needs.
- Textual notes should identify the textual problems and adequately
explain how the editors have dealt with them.
- Records of Word, Stanza, and Section Breaks
- All ambiguous line-end hyphenation of compounds or possibly
compound words in printed texts used as basic texts should be recorded; a
second list should indicate the way such compounds ambiguously broken in
the new edition should be quoted. This process will be facilitated by the
use of hard and soft (conditional) hyphens.
- Stanza, section, and verse paragraphs ambiguously broken at the
ends of pages in the base or copy-text should be recorded.
- Extra-Textual Materials
- Historical or critical essays and analyses, explanatory notes,
glosses, etc., should, if present:
- be clearly separated from the textual essay and complement rather
than duplicate information in the textual essay;
- dovetail smoothly with the textual essay;
- conform to a reasoned policy for length, placement, and content;
- be complete.
- Glossaries and proper-name tables or indices
- The rationale for determining entries should be clear and
appropriate both to the text and to the audience envisaged.
- The format should be clear and uncluttered.
- Cross-references should be provided for entries having alternate
- To the extent possible such tables should be electronically
generated on the basis of encoding.
- PREPARATION FOR PUBLICATION
- All necessary permissions to publish the material must be obtained
from the owners and copyright holders.
- The editor and the publisher should agree on the encoding scheme
and software to be used and the publisher should at an early stage see a
- The editor and the publisher should understand one another's special
requirements for publishing electronic scholarly editions, including:
- the particular design requirements of the formatted edition and, if
applicable, the format of the series as a whole;
- special aspects of the production schedule, including:
- the amount of time to be allowed for multiple proofreadings and for
necessary final collations.
- Final responsibility for maintaining the accuracy of the text
during production must be clearly assigned.
- Adequate resources should be allotted, and a comprehensive plan for
proofreading should be developed, taking into account:
- how proof will be readóby whom, how many times, and against what;
- which stages of proof will be read by the editor(s).
- Final collations or checks should be carried out to ensure that no
unauthorized changes have been made in the final electronic files in
proof. Spell checkers and word lists are useful for spotting anomalies but
all changes must be verified.
- Use of Electronic Files
- Since electronic files will be used for the formatted edition, the
editor and publisher should agree about:
- the choice of software and platform, bearing in mind problems such
as the linking of notes with text, nonstandard characters, etc. (ideally,
an edition should be available on as many platforms as possible);
- the extent to which the encoding scheme chosen will allow or
facilitate subsequent publication in other formats, e.g., print;
- who is responsible for inserting final changes or corrections in the
file -- the editor, the publisher, or third-party technical staff.
- Arrangements should be made for retaining and archiving the
- Consideration should be given to publication of the edition in a
variety of formats, including print.
- If the electronic files are to be translated to a system that will
drive the typesetting machinery for a subsidiary printed edition, the
resulting proofs should be checked as they normally would.
- Indexing: in addition to full text retrieval software, consideration
should be given to the encoding of items to be indexed (e.g., proper
names); and appropriate software for retrieval of indexed items should be
- Reformatting: To facilitate reformatting, editors and publishers
- making archived electronic files available for reformatting;
- encoding the apparatus and editorial in such a way that they can
easily be omitted, if desired, from reformatted versions;
- licensing libraries to extract data in order to integrate it into
locally-based electronic text collections;
- facilitating extraction of the text in a variety of formats (e.g.,
SGML, non-encoded ASCII) so that scholars may use the text with other
software packages or tools.
PLEASE SEND COMMENTS TO
The Electronic Scholarly Editions Listserv:
For the Committee on Scholarly Editions
Charles B. Faulhaber
The Bancroft Library
University of California
Berkeley, CA 94720-6000
December 1, 1997