The Berkeley Project, 1993-95
Nearly two years ago, the Library at the University of California at Berkeley received funds from the Department of Education to investigate the desirability and feasibility of developing an encoding standard for electronic versions of archival finding aids. The study was inspired by a recognition that archival repositories wished to expand and enhance network access to information about their holdings beyond that available in MARC catalog records, and that efforts to do so would likely be more successful if they were coordinated and standards-based.
In consultation with several archivists who had expressed an interest in the Berkeley Project, principal investigator, Daniel Pitti, identified a number of requirements that would need to be satisfied by any technique used to deliver expanded and enhanced archival description to network users. These include the ability to present extensive and interrelated descriptive information typically found in archival finding aids, the ability to preserve the hierarchical relationships that exist between levels of descriptive detail, the ability to represent descriptive information that is inherited by one hierarchical level from another, the ability to navigate within a hierarchical information architecture, and the ability to conduct element-specific indexing and retrieval.
Candidate techniques considered by the Berkeley investigation included Gopher presentation of ASCII data, HTML (Hypertext Markup Language) tagging of data, MARC tagging of information, and tagging of text in conformance with SGML (Standard Generalized Markup Language). The latter technique, an international standard (ISO 8879), emerged from the analysis as being able to meet all of the functional requirements of archival finding aids, and as being supported by a large and growing number of software products that run on a variety of platforms. Based on these results, Pitti and his colleagues at Berkeley elected to test the use of SGML in encoding archival finding aids.
Standard Generalized Markup Language is a set of rules for defining and expressing the logical structure of documents, and thereby enabling software products to control the searching, retrieval, and structured display of those documents. The rules are applied in the form of codes (or tags) that can be embedded in an electronic document to identify and establish relationships among component parts. Because consistent tagging of like documents is key to successful electronic processing of them, SGML encourages such consistency by introducing the concept of a document type definition (or DTD). A DTD prescribes the ordered set of SGML tags available for encoding the parts of each example in a class of similar documents. Archival finding aids, which share similar parts and structure, form a class of documents for which a DTD can be developed.
Pitti undertook development of a finding aids DTD by analyzing numerous examples forwarded to Berkeley by archivists who had responded to requests for cooperation. He found the greatest similarities in structure among those finding aids commonly referred to as inventories and registers; these structural similarities delineated the model finding aid that formed the basis of his draft DTD. The March 1995 version of the Berkeley Finding Aid Project (BFAP) DTD defined a class of documents that, in general, consist of an optional title page, the description of a unit of archival material, and optional back matter. A title page conforming to the draft DTD could comprise any of a number of taggable elements, such as repository or finding aid type. A DTD-conformant unit description could comprise a brief description of the unit (incorporating taggable elements analogous to those of a MARC catalog record), a longer narrative description of the unit and any segregable parts (incorporating such taggable elements as title, dates, and scope and content), and formatted container lists.
As the BFAP DTD took shape, it was tested in the encoding of electronic finding aids. By March 1995, a critical mass of encoding had been achieved, and the results (involving nearly 200 finding aids from 15 repositories) were shared with a group of 50 archivists and manuscript librarians invited to a Finding Aids Conference jointly sponsored by the Library of the University of California at Berkeley and the Commission on Preservation and Access. Conference attendees observed that SGML encoding of local and networked online finding aids could simplify, improve, and expand access to archival collections by making it possible to link catalog records to finding aids, by enabling searches among pools of networked finding aids, and by allowing keyword retrieval to locate folders or items previously buried in container lists. Attendees encouraged Pitti to pursue adoption of the approach as a standard by the archival profession.
Hoping to strengthen the case for profession-wide adoption of a BFAP-like, SGML-based encoding standard, Pitti sought the assistance of a team of experts in archival descriptive standards augmented by an expert in SGML encoding techniques  who could collaborate in a critique and refinement of the BFAP approach. Successful application was made to the Bentley Library Research Fellowship Program  for a week-long meeting of Team Pitti in Ann Arbor, Michigan, in July 1995. The team agreed to collaborate in the production of 1) finding aid encoding standard design principles; 2) a revised finding aid data model; 3) a revised finding aid document type definition; 4) finding aid encoding guidelines and examples; and 5) an article describing the team's understanding of the structure and content of finding aids.
Team Pitti reached early agreement on the principles that would underlie their design of an encoding standard. These principles (designated the Ann Arbor Accords) are reproduced at the end of this progress report. With the Accords in mind, the Bentley group proceeded to review the structure of the document to be encoded. They agreed that at the most basic level, a finding aid document consists of two segments: a segment that provides information about the finding aid itself (its title, compiler, compilation date) and a segment that provides information about a body of archival material (a collection, a record group, or a series). Following the example of the Text Encoding Initiative (TEI), the group designated the former segment the "header." Within the latter (or finding aid) segment two types of information may be presented: 1) hierarchically organized information that describes a unit of records or papers along with its component parts or divisions and 2) adjunct information that may not directly describe records or papers but that, nevertheless, facilitates their use by researchers (e.g, a bibliography). The hierarchy of descriptive information, reflecting archival principles of arrangement, generally begins with a summary of the whole and proceeds to delineation of the parts as a set of contextual views. Descriptions of the parts inherit information from descriptions of the whole.
Agreement on this overall structure enabled Team Pitti to evaluate the encoded elements that had been incorporated in the BFAP model. Those elements that survived the evaluation process formed two categories: elements that would be tagged at specific, predictable points in the description of units or component parts (descriptive elements) and those elements that could be tagged anywhere within the document (generic elements). Examples of the former include the elements "title" and "extent" encoded with a specific relationship to one another within the description of a unit or one of its component parts. Examples of the latter include the elements "link" or "name" that might appear anywhere. Generic elements usually are embedded within a descriptive element. The team agreed that when elements have a close analog in the TEI guidelines, the element name and, when appropriate, the element content model should be taken from those guidelines.
SGML provides for the association of attributes with encod- ed elements, and Team Pitti concluded that the finding aid DTD should take full advantage of this possibility. Attributes provide an optional opportunity to make an element more specific. A small set of basic elements can be expanded through attributes in lieu of creating a large set of specific elements. For example, an attribute associated with the personal name element can specify the role of the person as creator or collector, sender or recipient.
By combining descriptive and generic elements with attributes in a simplified document structure, the Bentley group was able to distill from the BFAP model the essential finding aid tag library. Within a few days of the week-long Bentley meeting, Pitti began to recast the accords reached in Ann Arbor into a revised data model and finding aid DTD (named EAD, or Encoded Archival Description). An overview of the early results of that drafting process are provided here following the statement of the Ann Arbor Accords. Anyone familiar with the earlier model will see that the key changes introduced in Ann Arbor are: 1) the separation of information about the finding aid into a header; 2) the distinction between the hierarchically presented unit description information and adjunct information; and 3) the replacement of the BFAP model's collection divisions and materials lists with the more open-ended concepts of recursive "component description" and a "display group" element to bind pieces of text for display in tabular form.
Team Pitti emphasized the importance of documentation, such as a tag library and application guidelines, to make the DTD viable. Such documentation should be "friendly" enough to enable users barely acquainted with SGML to apply the DTD both routinely and intermittently in their work. While the team focussed on elements to ease conversion of traditional finding aids, it also reached for SGML techniques that could begin to improve the delivery of register and inventory information, particularly in an online environment. The team speculated about future possibilities, involving attachment of online "help" scripts to explain descriptive practice as reflected in finding aids, links to central glossaries and shared administrative histories, and presentation of new views that might transform hierarchical data into archival family trees.
Among the topics discussed by the Bentley group were several associated with prospects for profession-wide adoption and maintenance of an encoding standard for finding aids. Recognizing that successful development of the DTD will require the participation of a broad community of archivists and archives users, the group planned to circulate widely both the Ann Arbor Accords and the revised data model based upon them. The 1995 annual meeting of the Society of American Archivists provided an excellent forum for presentation of concepts and ideas. The Society's Committee on Archival Information Exchange (CAIE) agreed to assume some responsibility for involving interested archivists by establishing a Working Group on the Encoded Archival Description DTD. Much still needs to be discussed. The tension remains, for example, between moving quickly to automate a traditional tool for some immediate retrieval gains and waiting until user surveys indicate what finding aid information is most needed in an online setting. With CAIE's help, however, it seems certain that a viable DTD for archival description can be developed, that it can be adopted as a professional standard, and that a process for maintaining it can be assured. The latter assurance is needed if the encoding standard is to evolve to meet the challenges presented by future finding aids.
At the SAA annual meeting in Washington, D.C. in early September 1995, members of the Bentley Team met with the Committee on Archival Information Exchange (CAIE) and asked them to become formally involved in the ongoing development of EAD. CAIE agreed, and created a working group chaired by Kris Kiesling (Henry Ransom Humanities Research Center) and with representatives from the Library of Congress, RLG, OCLC, and SAA. The SAA/CAIE Encoded Archival Description Working Group (EADWG) will monitor and provide support for the ongoing development of the EAD DTD and guidelines. At the appropriate time, EADWG will initiate review of EAD by the SAA Standards Board and SAA Council. SAA Council agreed to formally request that the Library of Congress Network Development and MARC Standards Office be the maintenance for EAD once it has undergone thorough community review and is accepted by the community as a standard.
In October 1995, based on the work accomplished in Ann Arbor by the Bentley Team, Daniel Pitti released to a small group of "early implementors" an EAD data model for review and a "straw man" DTD for testing. Two weeks later, the LC/NDL sponsored and was host to three days of meetings to refine the data model and "straw man" DTD. Participants included most of the Bentley Team, representatives from several LC divisions, and two SGML experts. Based on decisions made at this meeting, ATLIS Consulting Group, under contract to the Library of Congress, began revision of the DTD and the creation of a tag library.
In early December, the Society of American Archivists received funding from the Council on Library Resources to create application guidelines for EAD. To that end, a subset of the Bentley Team met with Anne Gilliland-Swetland of UCLA and Tom LaPorte of Dreamworks SKG on January 4-6 at UCLA to review the revised EAD and the tag library, and to outline the content of the guidelines. Further changes to EAD were also made and subsequently incorporated into the "alpha" version of the DTD.
In a letter to Susan Fox, SAA Executive Director, the Library of Congress Network Development/MARC Standards Office (ND/MSO) formally agreed to be the maintenance agency for EAD. SAA will be responsible for ongoing oversight of the standard. It is anticipated that SAA will, at the appropriate time, organize an EAD advisory committee comprising representatives from the archival and library community.
The alpha EAD DTD is completed. As soon as editing of the tag library is completed, ND/MSO will make the EAD DTD and tag library available electronically to the early implementors.
Alpha EAD will be tested while the application guidelines are being written. The Bentley Team will meet with Anne Gilliland-Swetland and Tom LaPorte in April 1996 in Berkeley to review the guidelines and reconcile the language in the guidelines and the tag library. The Beta version of EAD and an electronic version of the guidelines should be available to the entire archival and library community for testing and review in the late spring or early summer 1996.
For further information on the EAD, or information about becoming an early implementor, please contact:
1996. All rights reserved.
Document maintained at http://sunsite.berkeley.edu/FindingAids/ by the SunSITE Manager.
Last update 1/8/95. SunSITE Manager: firstname.lastname@example.org