PROJECT ABSTRACT

Applicant:

The Library
University of California
Berkeley, California 94720

Title of Project:

THE MAKING OF AMERICA II TESTBED PROJECT

Principal Investigator:
Peter Lyman
University Librarian
245 Library
University of California
Berkeley, CA 94720
(510) 642-3773
    Project Director:
Bernard J. Hurley
Library Chief Scientist
245 Library
University of California
Berkeley, CA 94720
(510) 642-3773

Funding Requested: $350,000
Project Period: May 1, 1998 - April 30, 1999

Collaborating Institutions:

Project Participants:
University of California, Berkeley Library
Cornell University Library
New York Public Library
Penn State University Library
Stanford University Library
National Organizations:
National Digital Library Federation
The Corporation for National Research Initiatives
The Council on Library and Information Resources
American Council of Learned Societies
OCLC Online Computer Library Center, Inc.
Research Libraries Group

SUMMARY: THE MAKING OF AMERICA II TESTBED PROJECT

The primary resources in the nation’s research libraries, which constitute an invaluable foundation for research, are increasingly viewed as resources for teaching and learning among students in elementary school through college. The geographic di stribution of library collections and the fragility and uniqueness of many primary source materials have always presented barriers to access. Heretofore, only those scholars able to travel to individual repositories have been able to utilize all of the r esources relevant to their research. Digital technologies offer opportunities to overcome these problems of geographic distribution and fragility of primary sources by making digital surrogates of the materials available on the Internet where they can be used by scholars, students, and the general public.

The Making Of America II Testbed Project continues and extends research and demonstration projects that have begun to develop best practices for the encoding of intellectual, structural, and administrative data about primary resources housed in researc h libraries. It builds most directly on development of the Encoded Archival Description (EAD), now being maintained jointly by the Library of Congress and The Society of American Archivists; it also extends the Making of America Project carried out by Co rnell and Michigan. While the EAD provides a community standard for encoding finding aids, it does not provide any guidance for creating or encoding the digitized surrogates of the primary source materials pointed to by these finding aids. Therefore, the next step in the process of developing seamless access through EAD-encoded finding aids is the development of related community practices for creating and encoding the digitized versions of the primary sources. The research objectives of this proposal i nclude:

The umbrella Making of America Project will create a digital library of primary source materials relating to the Making of America, focusing on the Gilded Age, 1876-1900. The Making of America II Testbed Project collection will be "Transportatio n, 1869-1900", particularly the development of the railroads and their relationship to the cultural, economic, and political development of the country. Creation of a coherent corpus of material on a focussed theme will enable testing of end-user accepta nce of the methods developed during the project.

THE MAKING OF AMERICA II TESTBED PROJECT

TABLE OF CONTENTS

  1. Significance of the Proposed Project
    1. Overview:
      1. Need
      2. National Priority
      3. Value for Research, Education, and Public Projects in the Humanities
      4. Nature, Size, Intellectual Content, of Collection
    2. Research Objective: Community Standards for the Creation and Use of
      Digital Library Materials
    3. Research Objective: Naming Conventions and Systems

  2. History of Prior Research
    1. Overview
    2. The Berkeley Finding Aids Project
    3. The California Heritage Digital Image Access Project
    4. The American Heritage Virtual Digital Archive Project
    5. The UC EAD Project
    6. Other Related Work

  3. Methodology and Standards
    1. Methodology Overview:
    2. Selecting Collections for the Project
    3. Accessing the Digital Library: The Users' Perspective
    4. Metadata Research
      1. Defining Structural Metadata for MOA II Digital Objects
      2. Defining Administrative Metadata for Images
    5. Research on Naming Conventions and Systems
      1. The Need for Naming Conventions and Systems
      2. Naming Convention Research Methodology
    6. Standards and Best Practices

  4. Plan of Work
    1. Overview
    2. Planning Phase
    3. Research and Production Phase
    4. Dissemination Phase
    5. Project Time Table
    6. Management Plan
    7. Digital Technology Plan

  5. Staff Qualifications
    1. Principal Investigator
    2. Project Director
    3. Technical Consultant
    4. Project Manager
    5. Other Key Personnel

  6. Evaluation
  7. Dissemination of Research
    1. Dissemination Plan
    2. Continuation of Work After the Termination of NEH Funding

  8. Project Budget

  9. Appendices
    1. Sample Discovery And Navigation System:
      • Collection-level Records
      • Encoded Archival Descriptions
      • Digital Images
    2. The National Digital Library Federation
    3. The Making of America II Project, Background
    4. Subcontractor Proposals and Budgets
    5. The Encoded Archival Description
    6. Architecture for Information in Digital Libraries
    7. Previous Research:
      • The Berkeley Finding Aid Project
      • California Heritage Project
      • American Heritage Virtual Digital Archive Project
      • UCEAD
    8. Related Work:
      • Making of America
      • J-STOR
      • American Memory
      • Dublin Core
    9. Naming and Architecture:
      • IETF URN
      • OCLC PURL
      • CNRI Handle
      • OMG CORBA
      • Microsoft DCOM
      • RLG Arches
    10. Structural Metadata: E-Bind
    11. The Interactive Media Group, Cornell University
    12. History of Grants
    13. Resumes of Project Participants; Advisory Board
    14. Suggested Evaluators

1
1
1
2
2
2

3
4

4
4
4
4
4
5
5

5
5
6
6
7
7
9
10
10
11
12

12
12
13
14
16
17
18
20

22
22
22
22
22
22

23
24
24
25

26


33
35
37
45
66
76
91
139
148
171
171
173
175
177
179
179
186
189
190
193
193
198
206
218
220
227
235
262
263
264
348
  1. SIGNIFICANCE OF THE PROPOSED PROJECT

    1. Overview: The primary resources in the nation's research libraries, which constitute an invaluable foundation for research, are increasingly important as resources for teaching and learning among students in elementary school through college. For decades, libraries have understood the importance of sharing information. The most visible examples of this philosophy are the sharing of collections via interlibrary loan and of catalog records that provide access to library collections. The found ation of sharing has been built on the creation of community practices and national standards that define the information and processes required to share library materials. The prime example is the development of the USMARC national standard, which defines a library catalog record. Once the catalog record was standardized, it became possible to create online union catalogs which now include records from libraries located all over the world. These union catalogs then became the primary tool for implemen ting modern interlibrary loan programs, as a library user could find out which libraries owned a particular item and then request it through Interlibrary Loan.

      However, because of their uniqueness or rarity, value, and fragility, primary resources housed in special collections throughout the world have never been lent in the ways that published materials have been. The geographic distribution of these collections has always been a barrier to research, and therefore only those scholars able to travel to the individual repositories have been able to utilize all the resources relevant to their research; primary resources have typically been unavailable to young er students and members of the general public.

      The Making of America II (MoAII) Testbed Project is significant because it will continue the applied research necessary to develop best practices and standards that will guide the creation of large-scale digital libraries which library users of all kin ds, from anywhere there is a network connection, can easily use for teaching, learning, or research.

      1. Need: The emerging national digital library has the potential to elevate resource sharing to a new level, as it will be possible for users anywhere to find and use entire books, journal articles, and primary source materials directly over t he Internet. Through digital technology, the problems of geographic distribution of primary sources can be overcome so that these resources can be more widely exploited for humanistic research, teaching, learning and public dissemination. However, this potential will be realized only if the library community agrees to new practices and standards that will allow digital library materials to be easily located and used. Without community standards, each library will store its electronic content in a propr ietary format in proprietary computer systems.

        The result will be that users will have to search each of these systems, located at many institutions, to find and use the digitized materials they require. In short, we will have created a series of proprietary collections and missed the opportu nity to create a national digital library.

        To create a national digital library, it will be necessary to define: a) community standards for the creation and use of digital library materials and; b) a national software architecture that allows digital materials to be shared easily over the network. It is possible to pursue both these goals concurrently.

        The National Digital Library Federation (NDLF: see Appendix B), a program of the Council on Library and Information Resources, was created to help promote opportunities and address problems inherent in the creation of digital libraries. Five of its sp onsors are joining in the present proposal to begin to address these two issues. They will work together with other NDLF sponsors, the Research Libraries Group (RLG), the Corporation for National Research Initiatives (CNRI), OCLC, five Council on Library and Information Resources/American Council of Learned Societies (CLIR/ACLS) Taskforces to Define Research Requirements of Formats of Information, The Library of Congress and others to develop best practices that will be required to implement a national digital library system.

      2. National priority: Many libraries have already begun to create digital library resources, but few have systematically addressed the issues of best practices and digital library architecture. The Library of Congress has implemented its National Digital Library Program, The Getty Information Institute is constructing community resource libraries, commercial entities are creating digital products, and individual research libraries are experimenting with digital technologies to solve various problems and provide services. The longer libraries wait to begin to develop community agreements about best practices and system architecture, the more difficult it will be to create distributed digital libraries that appear as an integrated whole to the user. Without seamless, integrated access, scholarly productivity will suffer, and the availability of useful resources obscured.

        The increasing national commitment to K-12 education combined with the current national emphasis on strengthening distance education through improved uses of educational technology also motivate the participants to begin work on this project . K-12 te aching and learning, as well as distance education require that core information resources be made available in digital form so that they can be used in the classroom and from remote sites. The Library of Congress has found that its American Memory project can be effectively used in many different ways by teachers and students alike. The digital collection constituting the MoAII Testbed Project, like that in American Memory, will be available to schoolchildren and their teachers as well as scholars.

      3. Value for Research, Education, and Public Projects in the Humanities: The larger Making of America Project (see Appendix C for background information) will create a collection of vital importance for the study of American history, including its culture, peoples, technology, political and social movements, and arts. The digital collection will be created from among the resources housed in libraries throughout the country, beginning with those of the five participants in this particular proj ect. The collection will complement other digital collections--for example, the Library of Congress’s American Memory project, RLG’s Studies in Scarlet, the American Heritage Project, and the California Heritage Project. By creating a critical mass of m aterials in a focused subject area, the participants will ensure that the project’s collection can serve as a testbed for evaluating the potential of the digital library to serve the needs of scholarly research as well as K-12 teaching and learning. Huma nities resources are particularly useful as a testbed for the digital library because they exist in many different formats (for example, print, manuscript, pictorial, multi-media, artifactual) and are widely used by scholars, independent researchers, scho ol children, and the general public.

      4. Nature, Size, Intellectual Content of Collection: Within the general framework of the MoAII Project, the MoAII Testbed Project proposed here will focus on the specific subject area of "Transportation in the United States, 1869-1900", with a particular emphasis on development of the railroads. This topic was chosen for the rich range of scholarly explorations it offers. From 1869-1900, the United States was knit together through forms of transportation of all kinds, but particularly t he railroads. Railroads formed the basis for distribution of industrial products, opened the West to settlement, served as a foundation for the economy of the nation, and fostered the inclusion of new ethnic groups into the American populace. The five participating libraries all have important collections in this area, and these dispersed collections will represent a stronger corpus of material if brought together into a single collection, as is made possible through digital technology. The digital lib rary collection to be created through this project will encompass many different forms of documentation (for a full description, see individual institutions’ proposals in Appendix D), including printed materials, broadsides, pictorial collections, manuscripts, diaries, letters, etc. More than 15 individual collections, including at least 25,000 digital surrogates of documents from those collections, will be represented in the database by the end of the project.

        Moreover, because preliminary community agreements will have been reached about the description of the various categories of objects, administrative metadata for digital images, and naming conventions, the collection will have greater intellectual inte grity than it would if the institutions were acting independently, and the digital library can grow over time, expanding its subject coverage; adding important collections; and incorporating a wide variety of contributors including libraries, museums, publishers, and others.

    2. Research Objective: Community Standards for the Creation and Use of Digital Library Materials: If digital library materials are to be easily shared across the network, they must be standardized to ensure they can be located, navigated, and displayed by different computer systems. NDLF has agreed that the first step in this process is to define the different classes of digital library materials that may exist. For example, a separate class could be defined for a digital surrogate of a book, a manuscript, a music score, etc. Once each class is identified it is necessary to define the elements that describe each digital object in that class. Three types of description, or metadata, are necessary:

      1. Intellectual Metadata includes the elements that describe the content of each digital object. For example, a digital book object would be described by its catalog record and include elements such as author, title, publisher, etc.; the Encod ed Archival Description (EAD) provides intellectual metadata for collections of objects (See Appendix E). Intellectual metadata is used to discover materials of interest. The present project will not directly address investigations into questions of intellectual metadata, but will rely on existing standards and practices.

      2. Structural Metadata is the information that describes the internal organization of each digital object in a particular class of materials so that the user can effectively and efficiently navigate through the object. In the example of a digital book, the structure could include a title page, preface, table of contents, chapters, pages, index, colophon, etc. Structural metadata is primarily used by computers to navigate that object for a user. For example, turn to the next page, jump to chapter three, etc.

      3. Administrative Metadata is used to record information about the digital object including technical specifications related to image capture, enhancements, or other derivative actions on the digital object; and other characteristics that must remain with the object to ensure its long term retention and use and protect intellectual property rights.

    3. Research Objective: Naming Conventions and Systems: In addition to defining community standards for the creation and use of digital library materials, NDLF intends to work with others in the community to address system architecture issues that must be resolved if a national digital library is to be realized. Given the potential number and size of digital library objects which will be created, the most likely architecture is one that distributes the objects across the network in a number of separate repositories (i.e., databases). A key problem in this architecture is naming each object so it can be found once a user decides s/he wants to look at it. The MoAII Testbed Project will evaluate proposed naming standards and systems that will address this critical component of a national digital library architecture.

  2. HISTORY OF PRIOR RESEARCH

    1. Overview: Berkeley, the lead institution for this project, has conducted a number of successful research projects that have explored metadata and technical issues relating to the creation, access, navigation, and use of digital facsimiles created from primary source materials. These prior projects have included broad community involvement with the objective of establishing agreement on a set of best practices for the creation of digital libraries. The present project builds on this prior research, as well as that of other entities. A description of the more important prior research follows.

    2. The Berkeley Finding Aid Project (Appendix G) was a collaborative endeavor to test the feasibility and desirability of developing an encoding standard for archive, museum, and library finding aids (documents used to describe, control, and provide access to collections of related materials). It involved two interrelated activities: 1) the design and creation of a prototype encoding standard for finding aids, and 2) building a prototype database of finding aids. The project was supported by Higher Education Act Title IIA funds, and software grants from Electronic Book Technologies and ArborText. A large selection of the finding aids is available on the Berkeley Digital Library SunSITE at http://sunsite.berkeley.edu/FindingAids/. Subsequent funding by the Commission on Preservation and Access and the Bentley Historical Library resulted in adoption of EAD by the Society of American Archivists, and its maintenance as a standard by the Library of Congress.

    3. The California Heritage Digital Image Access Project (Appendix G) was funded by the NEH to demonstrate that USMARC collection-level cataloging records and standardized, electronic versions of archival finding aids, used together in the netwo rk environment, can provide access to and control of digitized images. The project created and tested a prototype digital image access system available on the Internet based upon SGML finding aid technology developed in the Berkeley Finding Aid Project. (See http://sunsite.berkeley.edu/CalHeritage/).

    4. The American Heritage Virtual Archive Project (Appendix G) is an NEH-funded project, to develop a demonstration union database providing online access to distributed digital library resources. The four major objectives include: 1) to develop protocols that enable physically dispersed collections to appear as one; 2) to develop mechanisms to navigate related collections; 3) to develop prototype standards, policies and procedures for remote collaborative creation and maintenance of a union database of catalog records and finding aids that lead to digital images of primary source materials; and 4) to investigate, develop, and test mechanisms to ensure seamless access and navigation through the catalog and finding aids to images housed on remote servers. (See http://sunsite.berkeley.edu/amher/).

    5. The UCEAD Project (Appendix G) is testing production systems for creating EAD records for Libraries within all nine campuses of the University of California; the union database will form the foundation for the development of a full-scale digital archive for the University of California System (See http://sunsite.berkeley.edu/FindingAids/uc-ead/).

    6. Other Related Work(Appendices H & I): There is a large body of other related work that will inform the Making of America II Testbed Project. Some of the more important include: The Making of America Project (Michigan and Cornell ), Michigan's structural metadata work (including J-STOR),OCLC/CNI work on the Dublin Core metadata set and the Library of Congress’s American Memory Project. Related work on digital library architectures includes LC/CNRI investigations into digital library architectures and object naming conventions (i.e., the Handle System), OCLC's PURL system for naming digital library objects, the OMG CORBA system, Microsoft’s DCOM, and RLG's Arches system. Project investigators will closely watch the NSF/DLI projects for applicable solutions, as well as the Larson/Watry research into cross-domain searching. Work of the Text Encoding Initiative, CIMI, Getty Information Institute and others will be regularly monitored. Of particular importance will be the work of the CLIR/ACLS scholarly task groups.

  3. METHODOLOGY AND STANDARDS

    1. Methodology Overview: The MoAII Testbed Project’s methodology is significant for three reasons. First, it continues applied research necessary to form the foundation for the development of large-scale, usable distributed digital libraries by creating a testbed where digital object metadata and naming conventions and systems can be evaluated. Second, the methodology is designed to disseminate the results of this project’s research into the library community as a basis for the development of agreements about community standards for the creation of those digital libraries. The MoAII Testbed Project initiates this process by investigating best practices for naming conventions and for certain classes of administrative and structural metadata, and then provides the testbed where these definitions can be evaluated. Finally, it initiates the creation of a focused digital library collection, capable of being augmented through the years by many additional institutions. The methodology for this project:

      1. Involves the library community by bringing together the participants, scholars, NDLF sponsors, libraries, experts and national organizations to understand and agree upon metadata and architecture issues.

      2. Creates a testbed that links a union catalog of MARC collection level records to a union catalog of EADs, then to distributed repositories of digital surrogates of important primary source materials.

      3. Evaluates the research by using the testbed to understand architecture issues related to the naming of digital objects and definition of best practices for metadata. Equally important, the testbed will allow humanities scholars and students to evaluate the usefulness of these approaches and to advise the library community on research directions.

      4. Disseminates the results not only through the digital library it creates, but also through the project website, widely-distributed drafts of proposed practices, scholarly papers, and a national invitational conference that will summarize the research of this project and set future goals, and research agendas.

        The methodology to be used in this project builds on previous work and is intended to serve as an essential component of the National Digital Library Federation's exploration of digital library systems that will deliver information seamlessly to users, regardless of where it resides on the network. The present project is expected to lead to future research: for example identifying and defining additional classes of digital library objects; investigating ways of providing multiple views of the digital content (for example, enabling school teachers to attach interpretative or curricular materials to scholarly resources); archiving and migration, and further refinement of a technical architecture for digital information object repositories.

    2. Selecting Collections for the Project: In order to create a testbed and prototype digital library, the participants have chosen to digitize at least 25,000 images from important collections relating to the topic of "Transportation in A merica, 1869-1900", with a focus on development of the railroads. Participants expect that the virtual collection resulting from the project will have two main values. First, it will be available to scholars, undergraduates and K-12 students as example s of primary historical documents which can be used for a variety of research and classroom purposes; and second, it will form the kernel of a broader MoAII collection, to which additional institutions can contribute. For a description of the overall foc us of the proposed MoAII digital library, see Appendix C; for descriptions of the individual collections to be digitized relating to the theme of transportation, see Appendix D. Collections, and individual items in the collections, will be selected for their current and anticipated use, relationship to other similar items in the collections of other participants, and risk of deterioration from use of the originals. Curators of the collections will consult with faculty or scholars as necessary during the selection process. In addition, the selection will be informed by work of the CLIR/ACLS Task Groups on Research Requirements for Various Formats of Material (to be formed).

    3. Accessing the Digital Library: The Users’ Perspective: Project participants will create a multi-level digital library structure in which the highest level of access is through a union catalog of USMARC collection-level catalog records accessible through a web interface, using OCLC’s Site-Search software. The USMARC records will point to an intermediate level of access consisting of Encoded Archival Descriptions (i.e., finding aids for primary resource materials) of these collections. The Encoded Archival Descriptions (EADs), which will be housed in a union database for the purposes of cross-collection searching, will then point to digital surrogates of primary resources. For examples of the prototype discovery, navigation, and access system, see Appendix A.

      Users of the prototype digital library will have the option of beginning their search at the highest level of indexing (the catalog record), or at the intermediate level (the collection of EADs). Users who have links such as a URN (Universal Resource Name) for a specific EAD or digital surrogate can go directly to that item (i.e., a known item search). Users who encounter a link to the EAD or digital surrogate while browsing will also be able to follow that link to the item. A more technical descrip tion of the access methodology can be found in Section IV.F.2, The Digital Technology Plan: Access Overview.

    4. Metadata Research: Community standards for metadata are an important element in the digital library architecture model proposed by NDLF. This model incorporates three types of metadata associated with digital library objects (for a background review of the proposed architecture, see Appendix F):

      • Intellectual metadata are used to describe digital objects so that they can be identified and "discovered". The NDLF national digital library architecture envisions intellectual metadata existing on two levels: a) USMARC records loaded into union and local online catalogs and; b) intermediary indices that are used to provide more detailed access (e.g., in this project, the EADs are an intermediate access layer). Although the MoAII Testbed Project will not focus on issues of intellectual metadata, both the USMARC record union catalog and the EAD finding aid intermediate level access layer will be provided in the MoAII Testbed Project database to allow the NDLF architecture model to be tested, modified and evaluated.

      • Structural metadata are used to define the internal organization of a given digital object so that a user can navigate within the object.
      • Administrative metadata are used to record information about the digital object which will need to remain with the object for its long term retention and use.

      The assumption of the MoAII Testbed Project is that careful definition of various classes of digital information objects is required in order to enable users of the national digital library (which will be created by geographically distributed contributors and will incorporate information resources housed in distributed repositories) to find, navigate through, and effectively use information. Project investigators will work with the broader community to define: a) practices for creating and encoding administrative metadata for digital images of primary source materials; b) practices for defining structural metadata for the digital objects used in the MoA II project, and; c) alternatives for encoding the digital objects themselves (i.e., practices for encapsulating content, e.g., images, along with administrative and structural metadata inside the object).

      The development of agreements on these topics is a necessary precondition for the establishment of large-scale production-level digitization projects that can be built into an interoperable, distributed digital library that can be migrated and archived over time. Investigators will use the MoAII Testbed Project to explore alternatives for metadata practices, including the encoding of this information into digital objects and to help the broader community understand the advantages and disadvantages of various approaches.

      1. Defining Structural Metadata for MoAII Digital Objects: The EAD is a community standard for encoding finding aids. Automated systems based on the EAD have enhanced access to primary source materials by allowing scholars and students to view and navigate these digitized archival collection descriptions over the Internet. The California Heritage project demonstrated that the finding aids could be linked to individual images which act as surrogates for primary source materials. The American Heritage project demonstrated that the creation of a multi-institutional virtual archive of finding aids is not only feasible, but has the potential to enhance the use of primary sources.

        While the EAD provides a community standard for finding aids, it does not provide any guidance for creating or encoding the digitized surrogates of the primary source materials pointed to by these finding aids. Therefore, the next step in the process of developing seamless access to digitized primary source materials through EAD based-finding aids is to develop the related community practices for creating and encoding the digitized versions of the primary sources.

        Structural metadata is an important component of information that must be captured and standardized as part of any process that digitizes primary source materials, as it is this metadata that defines the internal organization of an archival object (e.g., a book, diary, manuscript, photo album, etc.). Computer programs can then use the standardized structural metadata to display and navigate that object, at the scholar’s request. For example, RLG has deployed a display/navigation software tool called Documatic that will be investigated in this project, as part of the effort to identify standard structural metadata elements.

        The internal organization of digital archival objects beyond a single image, such as those used in the California Heritage Project, can quickly become complex. In the last section, it was suggested that a digital book-like object could have structure such as chapter and pages. In fact, it will have a rich and complex organization which could include items such as title page, preface, table of contents, indices, references, pages (possibly both image and text versions of the same page), chapters, colophon, etc. This complex organization must be captured in the structural metadata and become part of the digital archival object it helps to define. In our book object example, access to the structural metadata allows a computer program to implement book-like behaviors such as turn to the next page, jump to page twenty, skip ahead four pages and jump from the table of contents to a specific chapter by clicking on the chapter name.

        The problem becomes even more complex because the structural metadata for any given object, like a digitized book, may vary based on how it was created. For example, a digital book that is created from scanned page images could have different structural information than one which had its text converted from an OCR process. The former is a series of images linked together by the structural metadata to provide book-like behaviors. The latter is a string of text characters in which the concept of a page is less clear. However, in both cases, one would still want the book to support similar behaviors, such as jumping to chapter three. Another complication that needs to be addressed is how to handle structural metadata that may also be intellectual metadata. For example, chapter headings may be indexed for searching (i.e., intellectual metadata) and also be used for navigating an object (i.e., structural metadata: e.g., jump to chapter 3).

        The value of providing standardized structural metadata for classes of digital library objects, such as books, manuscripts, etc., is that it makes it easy to share and use objects on a national basis. NDLF envisions that digital objects will be stored in distributed repositories spread across the country. Given the size and number of digital library objects available on a national basis, this seems the only reasonable approach. However, if each repository used its own proprietary structural metadata scheme, this would mean the computer programs used to display and navigate the objects would need to be rewritten for each repository. The result is that users would have to deal with proprietary navigation tools for each repository accessed. By standardizing the structural metadata, a single navigation tool could be used across all repositories. If a particular organization had a need to develop a specialized display and navigation tool for a class of archival object, it could be done once and would still work across all repositories.

        An important research objective for this project is to work with the participating institutions, NDLF sponsors, other research libraries and national organizations to: a) identify the classes of digitized archival source materials that require standard practices for structural metadata encoding (e.g., books, manuscripts, photographic albums, single images, diaries, pamphlets, etc.); b) describe the attributes and behaviors that these classes must exhibit, and then to develop structural metadata practices that will allow them to be used in a uniform manner across repositories. For example, should all book objects be able to support the behavior of turning to the next page, the previous page, jumping to a particular chapter, linking for an index entry to the appropriate reference, setting and returning to a book-mark, etc.?; c) investigate encoding schemes for structural metadata, such as SGML/TEI, table-based models, etc.; d) create the MoAII Testbed in which the proposed structural metadata standards and encoding schemes can be demonstrated widely and then evaluated by the community.

        The MoAII Testbed Project will be based on the work undertaken by Cornell and Michigan in the Making of America I Project (Appendix H) which investigated access to digital surrogates of books, and on the Berkeley E-Bind project (Appendix J) which has begun to explore structure for other archival materials (see http://sunsite.berkeley.edu/Ebind/). The goal of this research objective is to develop a national consensus on a structural metadata best-practice and/or standard for these categories of materials, as was done for finding aids (i.e., the EAD standard). The structural metadata, along with the administrative metadata, will be encoded with the digitized content of the primary source material to create a digitized archival object that can be stored in a distributed repository. The result will be to ensure that each class of digitized primary source material, once discovered by scholars and students through the EAD standardized finding aids, can be viewed and navigated in an easy and consistent manner, regardless of its internal structural complexity or source of origin on the network.

      2. Defining Administrative Metadata For Images: Another important research focus of this project will be to build consensus on issues of administrative metadata. We will focus primarily on administrative metadata created before or at the time of digital image conversion of library resources rather than on metadata associated with derivative files. During the course of this project, MoAII Testbed Project participants will draw upon the combined expertise of the group, as image technicians and end users, to define an administrative metadata set, focusing on point of capture. We believe that working on these issues with this group of experienced participants will pave the way for later decisions about such issues as system architecture and archival strategies for refreshment and migration, as well as a variety of other topics.

        During the MoAII Testbed Project, participants will compare and discover best practices for project management of data, especially keeping in mind the notion of entering/capturing metadata once and then using/reusing it for a variety of purposes. Administrative metadata must serve short term project management as well as long term file management purposes and form the basis (an "electronic colophon") for future users who need to analyze the images and their derivatives. We expect the metadata to be used by different audiences, so that any model adopted must be adaptable; the information will need to be dynamic, changing as files are used and new formats derived, as well as permanent, to carry notice of the requirements for using the files, particularly critical in thinking of technical change and obsolescence. Careful definition and encoding of administrative metadata are critical to future archiving and migration of digital objects.

        At the end of the project, we expect to have a well-defined and tested set of administrative metadata for image capture. Through group discussion and usage we will be able to reach a consensus that will prove to be robust under the scrutiny of the larger community. We will integrate this with an accompanying "best practices" document. This will provide solid image metadata so that a future systems architecture group can build tools and structures for these digital resources on a solid foundation. By establishing and evaluating best practices in the MoAII testbed system, we will contribute to the creation of more permanent archival standards and procedures, constituting, in effect, community metadata standards. 1

        Although administrative metadata relating to intellectual property rights are equally important, and project staff will do background research into existing practices, it is unlikely that this project will be able fully to explore the relevant issues and propose practices.

    5. Research on Naming Conventions and Systems (see Appendix I for additional information):

      1. The Need for Naming Conventions and Systems: The NDLF envisions a national digital library architecture that will populate distributed repositories with digital library objects. While developing a fully distributed architecture for a national digital library is beyond the scope of this project, it is desirable to begin work on specific community practices that will be incorporated into such an architecture. The first of these practices that must be addressed is the naming of digital library objects. The methodology discussion to this point has described the process that creates a digital archival object by encapsulating the encoded administrative and structural metadata along with the digitized primary source material. Once a digital archival object is created, it must be named so it can be retrieved later for use by scholars and students.

        In time, the population of digital objects that act as surrogates for books, journals, manuscripts, photographic images, video recordings, audio presentations, multimedia productions, etc., will grow into the millions. A key component in realizing a national digital library architecture is developing a standard naming convention that allows each digital object to be uniquely named in order that it may be located on demand in its home repository. As important, the naming convention must be supported by a network based system that can create distributed naming authorities to manage the images and their unique names (i.e., assigning names to newly created images, renaming, moving an image to a new storage location, etc.).

        The naming convention must: a) allow for distributed naming authorities so each unique digital library object can be named by its owning institution; b) create a persistent name (e.g, a URN - uniform resource name) that will remain the same even if the object is moved to a new location on the network; and c) be able to retain information on multiple instances of an object (e.g., an object might be mirrored in another repository for performance reasons).

        In addition, the naming convention must be implemented within a name creation/resolution system that is reliable (available 24 hours per day and guaranteed by mirrored sites) and minimizes network traffic where possible. Finally, it must implement a distributed name administration service allowing each naming authority with proper security access to add, edit, and delete names for their digital library objects.

      2. Naming Convention Research Methodology: The Internet Engineering Task Force (IETF) has been investigating, specifications for a URN that can be used to name digital objects on the Internet. Already, there are at least two prototype naming support systems that are based on the work of the IETF: the OCLC PURL system and the CNRI Handle System. In addition, the Object Management Group (OMG), a consortium of over 700 corporations, is developing a technology for naming and managing digital objects in distributed repositories, called CORBA (Common Object Request Broker Architecture). Finally, Microsoft is promoting an architecture that competes with CORBA called DCOM (Distributed Component Object Model). For additional information about these various naming schemes, see Appendix I.

        This project will investigate the following architecture research objectives:

        1. opportunities and shortcomings of using URN’s to name digital library objects within an architecture that could support a national digital library;

        2. the functionality of the prototype systems that support URN creation, management and name resolution to determine if they are adequate for use in a national digital library architecture;

        3. the possible role CORBA and/or DCOM naming and distributed object support systems may play in a national digital library architecture. (Note: these object based technologies are not necessarily in competition with URN based naming and could very well complement the use of URN's) and;

        4. the possibility of creating best practices for digital naming conventions that are informed by the research conducted in this project and accepted by the community.

        In order to achieve these research objectives, the project will add to its testbed a CNRI Handle Server and an OCLC PURL server to be used to evaluate the issues in using URN’s to name digital library objects. The testbed will also be used to investigate the functionality required in a URN based system for creating and supporting naming authorities, creating and managing individual names for digital library objects and for the resolution of names into network based addresses needed to access each object. The project will also investigate both CORBA and DCOM technologies to see what role these may play in naming digital library objects and, if appropriate, add these technologies to the testbed.

        It is significant to note that CNRI and OCLC have agreed to participate in this project. It is expected that the use of their prototype systems in the MoAII Testbed Project will lead to enhancements in their systems that will benefit the community as a whole.

    6. Standards and Best Practices: The project will use established practices and standards where they exist (for example MARC, Z39.50, SGML and EAD) and will attempt to develop common practices and community standards where they do not yet exist. Since the project will use easily available commercial software (for example, SiteSearch, SGML authoring tools, and standard web-browsers), no software development will be needed. Limited systems integration work will need to be done.

      Because of its experience with the technology, the access model, operations, and production, Berkeley will take the lead among the NDLF collaborators and offer its methods as a basis for this project. As noted above and in Appendix G, these methods have been developed through a series of R&D projects. While Berkeley’s procedural methods will act as guidelines for the other collaborators, strict adherence to them will not be required and they may be changed generally as a result of the collaborative work. Flexibility is an important aspect of the production methodology the consortium of participants wishes to demonstrate, because in the real-world production of data is decentralized. The different procedural methods used by participants will be documented and analyzed as part of the evaluation of the project so that they can be adopted as production methods for later participants in the fully fledged MoAII Project.

      Although procedural methods will vary, project participants will actively participate in the development of, and adhere to, agreed-upon practices governing the encoding, pointing/linking, and quality of data capture (USMARC for collection-level records; EAD for finding aids; Uniform Resource Names (URN) and Uniform Resource Locators (URL) for naming resources; administrative metadata for digital images; and the structural metadata practices developed for multi-image digital objects). The use of best practices and standards is the key element that will allow the project to succeed in bringing together a diverse group of collections dispersed among five institutions into a single, coherent, access system.

  4. PLAN OF WORK

    1. Overview: This is a highly collaborative project involving five sponsors of the National Digital Library Federation (Cornell, New York Public Library, Penn State, Stanford, and UC Berkeley). Because of the number of institutions and the technical and intellectual challenges, the project is complex. The plan of operation calls for a tightly scheduled sequence of events occurring over two years and requiring the close cooperation of many staff members in a number of different departments at the many collaborating institutions. The plan of operations will be divided into three main phases:

      The planning phase (beginning July 1, 1997 and continuing until April 30, 1998) will be funded by NDLF and the participating institutions. During this year, participants will set the foundation for the work to be carried out with NEH funds beginning in May 1998. The research and production phase (beginning May 1, 1998 and continuing for one year) is the MoAII Testbed Project being proposed for funding by NEH and the participants. It is during this phase that the core research and community standards-building process will take place and the digital library prototype will be created. During the dissemination phase (immediately following the NEH-funded investigations) the NDLF will provide funding for dissemination and community review of the results. The following is a description of the work to be done in each phase:

    2. Planning Phase (July 1, 1997 through April 30, 1998): Participating institutions, at their own expense, will do the following:

      1. Make final selection of collections to be included in this project. To be completed by August 1997.

      2. Create USMARC collection-level records for resources to be included in the project. To be completed by April 30, 1998.

      3. Author finding aids for each collection selected, then mark them up using EAD. Decentralized authoring will be done using any of several commercially available SGML authoring tools. Berkeley can provide software recommendations to collaborators who desire advice. Although most participants already possess expertise in the creation of EAD records, those that do not will take responsibility for gaining it. For example, they can attend training sessions provided by RLG; alternatively, Berkeley is willing to offer a session to participants and other libraries at cost. To be completed by April 30, 1998.

      4. Gain expertise in scanning required for work in the research and production phase (second year) of the project. Most participants already have substantial experience in scanning. If necessary, Cornell University is willing to offer training to participants and other libraries at cost. Each institution will identify staff with this knowledge to serve as "technical liaisons" for the project (minimally one individual per institution). To be completed by Fall 1997.

      5. Maintain the centralized MARC and EAD records at Berkeley throughout the duration of the NDLF and NEH projects. Participants may also mount their MARC records in their OPACs for local access and/or on the bibliographic utilities. Several will also store their EADs locally, with links from local MARC records and also links to the digital images of the collections described in the EADs. Local infrastructure decisions to be made by Winter 1997.

      6. Participants will also develop and implement their own system or production database for tracking images and image metadata during the production phase of the project. Berkeley has created a local system to track images created for the California Heritage and Digital Scriptorium projects, and will share these production and tracking methods with other participants. Cornell also has expertise in this area, which will be shared. To be completed by April 1998.

      7. Staff at Berkeley and Cornell, in consultation with Dr. Howard Besser, will review imaging practices from a wide variety of projects, including, for example, Making of America, JSTOR, Digital Scriptorium, Getty Information Institute and others. Based on this review, procedures for the production phase of this project will be developed, and draft practices created by Fall 1997. On this foundation, participants will then re-assess their scanning expertise and work with Berkeley and Cornell staff to refine production plans (Winter 1997-Spring 1998).

      8. While participants are conducting the work described above, Berkeley will concurrently take the lead in developing naming as well as administrative and structural metadata proposals for participants’ review as foundation to the research and production phase of the project.

        1. Working with Dr. Besser, staff at Berkeley will identify issues, and create draft community recommendations for capture and encoding of administrative metadata. This work will involve the other participants in this project, other libraries, digital library experts, computer scientists, and consultants as appropriate and will be completed by Winter 1997.

        2. Berkeley, in consultation with Dr. Besser, and in extensive collaboration with participants, other libraries (especially Michigan and Cornell), external experts, and the Library of Congress, which has developed mechanisms for navigating though complex objects included in its digital library, will take the lead in developing proposed structural metadata models. Berkeley and the collaborators will work closely with the CLIR/ACLS Scholarly Task Forces to determine the behaviors that scholars expect from particular types of digital objects. Researchers will analyze the content proposals from the participants to identify major potential classes of information objects that will need structural metadata defined. They will develop and test prototype practices for structural metadata in the NEH project year; a draft from Berkeley is due to participants by Winter 1997.

      9. Berkeley will install a CNRI Handle Server and an OCLC PURL server for the project. Berkeley will analyze Handle, PURL, CORBA, and DCOM protocols, and consult with computer scientists as well as experts in industry to develop a thorough understanding of the various alternatives available for naming. Berkeley will collaborate with CNRI to develop proposed naming conventions and will work with the other participants to create appropriate naming authorities and to determine the exact methodology for evaluating these services. To be completed by Spring 1998.

    3. Research and Production Phase (May 1, 1998 through April 30, 1999): The following plan describes production, training, standards refinement, and feedback plans.

      1. Review of EADs: The project will begin with Berkeley's review of the EADs created by each institution during the planning phase. The American Heritage Virtual Digital Archive project will have developed recommendations for a range of acceptable practice for encoding data with the EAD. These recommendations will be further tested, verified, and modified if necessary. Participants will then revise the EADs if necessary. This review will be completed by June 1998.

      2. Training: In June 1998, a one-week intensive training workshop for the participants' Project Coordinators and technical liaisons covering all aspects of the project will be given at Berkeley by the Project Manager with the assistance of staff from Berkeley's Electronic Text Unit (ETU), Dr. Howard Besser, staff from the preservation departments at Cornell and Berkeley, and other experts as appropriate (possibly from CNRI, the Library of Congress, or other institutions). The workshop curriculum will introduce participants to the draft practices for creating administrative and structural metadata to be used in the project, conversion methods and image capture guidelines for various source documents, discussion and development of standard naming conventions, pointing and linking information (URN/URL), and a thorough discussion of various production methods. During the period of instruction, discussion, and hands-on training, procedures will be modified as necessary. These procedures will the n serve as norms for the remainder of the project, being modified periodically through a process of community review. Berkeley will host a private listserv specifically dedicated to this project so that participants can work together at all times.

        During Fall 1998, Berkeley staff will conduct site visits to each participating institution to review work, monitor progress, give further instruction as necessary, respond to questions, or gather recommendations for change.

        Following the production phase, Project Coordinators and technical liaisons will return to Berkeley for a January 1999 workshop to finalize recommendations for practices to be used in subsequent projects, which will be formally introduced to the library community in the third phase.

      3. Production and Integration of Data

        1. Union Catalog: Berkeley will create the union catalog for this project, and will accept MARC records from the participants through FTP. Records with appropriate links (i.e., handles) to EAD will begin coming in June 1998 and will all have been loaded by December 1998.

        2. Union Database of Intermediate Metadata: EADs reviewed by Berkeley’s Electronic Text Unit, and edited by participants, will be FTP'd into the union database at Berkeley. EADs with appropriate links (i.e., handles) to images will begin coming in June 1998 and will all have been loaded by December 1998.

        3. Creation of Digital Images: Images will be produced, along with administrative and structural metadata; the images will be named, linked to EADs, and EADs to collection-level records. Many project activities will overlap and a number of teams will be working concurrently on the different aspects of the project at each of the participating institutions. Approximately 4000-5000 images will be captured by each participating institution. Image conversion should begin at each institution by August 1998 and be completed by December 1998. Metadata will be created on a similar schedule, and mechanisms for its processing established for each institution and the entire project.

          The steps followed by each of the collaborating institutions will be similar to those described below for Berkeley's part of the project. Collections will be prepared for digitization by project staff (for the collections included in the project see Appendix G). A selection team will develop and document an image selection plan (since, for practical reasons, not all of the contents of selected collections will be digitized). The selection plan will determine criteria for providing title, name, subject, and genre access, where appropriate, to individual images and to "clusters" of related images, which will be retrieved as groups.

          At Berkeley, the selection plan will be undertaken by the project team in The Bancroft Library Technical Services Department (BTS), according to guidelines established by the curator, Dr. Bonnie Hardwick, who will consult as necessary with faculty. Selected items will be digitized by the Library Photographic Service (LPS), an in-house service agent. Digital reproduction may be done in several ways, including flatbed scanning, conversion of photographic intermediates, or digital camera. Participants will use agreed upon standards for image capture when they exist; but establishment of imaging standards is not a primary focus of this project.

          The selection of the images and recording of administrative and structural metadata for the 5,000 items at Berkeley will take approximately five months (June 1998-October 1998). The collaborating institutions will follow roughly the same time table, even if they choose to employ different procedures. Coordination of the participants will be managed by the Project Manager at Berkeley, who will also monitor the production schedule and the quality of the data produced by all participants during the course of the project; ensure that the methods used follow project guidelines and are cost-effective; and confirm that the required number of images is produced.

        4. Storage of Digital Images: Although the participants' standardized intellectual metadata (MARC collection-level records and SGML-encoded finding aids) will be integrated on a central server at Berkeley, the digital images of items from their collections, along with associated structural and administrative metadata will be stored locally on the participants' own servers. Berkeley will offer technical advice and consulting services for participants throughout the project.

        5. Public Access: Berkeley will maintain a website throughout the project, describing the project and providing periodic updates about progress. The project's website will also make metadata and naming proposals available for public comment and point to the union databases of MARC and EAD records. As completed collections become available (starting in September 1998), EADs with images will be linked to the project website, comprising a prototype digital library.

      4. Evaluation and Dissemination of Project Results: The project will be under continuous evaluation by participants, scholars, technical consultants, and the library community; standards, practices, and procedures will be modified throughout the course of the project as a result of experience, deliberation, and review of related work occurring simultaneously elsewhere. The Interactive Media Group at Cornell will lead a comprehensive, formal review of the project, including user studies. A variety of methods will be used to ensure wide dissemination of information about the project while it is in progress, to allow feedback from users and other libraries about the prototype practices for administrative and structural metadata, and naming conventions; and to present formal results. For a full description of the evaluation and dissemination plans, see sections VI and VII, below.

    4. Dissemination Phase (Beginning June 1999): Following completion of the Research and Production Phase, the NDLF will fund an invitational seminar to review project results. Participants will include representatives of a broad spectrum of fields and interest groups, including, for example, digital library experts, archivists and special collections librarians, scholars, computer scientists, museum technologists, and others who have participated in other phases of development of the EAD protocols, are engaged in similar work, or who have appropriate expertise. The results of this phase will include widespread dissemination of the results of the project, refinement as necessary of the practices established, and formulation of an agenda for further community review and acceptance.

    5. SUMMARY: PROJECT TIME TABLE

       Planning Phase: July 1, 1997 through April 30, 1998
      Aug. 1997Select collections to be included in the project; create USMARC collection-level records as needed in OCLC or RLIN, following national standards.
      Sept. 1997Identify staff to function as project Technical Liaisons as well as Project Coordinators.
      Oct. 1997Begin creating Finding Aids for each selected collection in compliance with the EAD SGML DTD; all Finding Aids will be created by April 30, 1998. Administrative image metadata model for project distributed by Berkeley for participant and broader review.
      Nov. 1997Draft practices for image capture documents completed by Berkeley and Cornell staff and circulated for review.
      Dec. 1997Structural metadata model, based on collection descriptions, distributed by Berkeley for review by participants and scholars.
      Jan. 1998Participants to review in-house scanning expertise and experience and overall infrastructure, relative to models offered (draft practices and models above); additional training and documentation needs identified and solutions planned.
      Spring 1998Participants develop documented rationale for selection of items within collection based on developing project models and practices; each institution to identify up to items for conversion.
      Spring 1998Berkeley will procure a Handle server for the project; initial testing to be completed for operation by March 30, 1998.

       Research and Production Phase: May 1, 1998 through April 30, 1999
      June 1998Berkeley reviews participants' EAD work on finding aids for collections of this project. Participants' Training Workshop at Berkeley, for Project Coordinator and Technical Liaison from each participating institution (one week). Project homepage and listservs begun.
      Aug. 1998Berkeley installs union catalog for collection level MARC records for project and begins receiving records from participants via FTP.
      Sept. 1998Berkeley installs EAD union catalog and begins receiving EADs from participants via FTP.
      Aug.-Dec. 1998Image conversion at each institution: retrieve collections, digitize and, load images locally, assemble metadata, including adding handles to MARC catalog records and EADs.
      Nov. 1998Site Visits: Berkeley Project Leader to visit each participant's site to monitor progress.
      Dec. 1998Evaluation Team (EVAL) preliminary work: to present model at January meeting.
      Jan. 1999Second meeting of Project Coordinators and Technical Liaisons from each participating institution, to revise working documents and finalize models based on their use. Also, introduction of evaluation procedures.
      April 30, 1999Evaluation process and final report for NEH completed.

       Dissemination Phase: May 1, 1999 through August 30, 1999
      May 1--Aug. 30, 1999NDLF-sponsored invitational conference on MoAII Testbed Project results held. Conference report published within 90 days.

      Project staff write and deliver papers; demonstrate prototype at conferences and meetings.

    6. Management Plan: The UC Berkeley Library will be the lead institution among the project's collaborators. The project will be managed by the UC Berkeley Library through the Berkeley campus Sponsored Projects Office in accordance with all applicable Federal and University guidelines. Coordination of the participating institutions will be overseen by a Management Council comprising the Project Director, Bernard J. Hurley; a Project Manager to be hired; and project managers from each participating institution. Key personnel from Berkeley's project team will coordinate each step of the project with the other collaborating institutions and advise them on procedures and technical matters connected with the project. A project manager at each institution will guide local production of catalog records, finding aids, and images; serve as liaison to Berkeley's management team; participate in the consortial management team for the project; and assist in evaluation. Each partner will also designate a technical liaison to coordinate technical implementation with the Berkeley team. OCLC and The Research Libraries Group will each provide a liaison to this project, and CNRI will serve as technical partner on development of naming protocols. All of the principals in all of the participating libraries have extensive experience in managing complex projects and in collaborative efforts among libraries. The NDLF Planning Task Force and its architecture group will review the project’s progress regularly. Work will be closely coordinated with the CLIR/ACLS scholarly task forces. The Management Team includes:

      1. Principal Investigator: Dr. Peter Lyman will provide policy oversight of the project, coordinating project activities at the policy level with the other participating institutions' administrators. He will assure the continued development of the prototype access system and the maintenance of the data created in the project. He will participate in disseminating the project results and will represent the project to national bodies such as the Association of Research Libraries, the Council on Library and Information Resources, the National Digital Library Federation Policy Board, and the Coalition for Networked Information.

      2. Project Director: Mr. Bernard J. Hurley will lead the design of the research methodology and technical architecture of the demonstration project. He will have primary responsibility for planning and directing each phase of the project. He will direct the evaluation of software and hardware and oversee the development of the prototype system. He will consult with CNRI on the development of the Handle System URN management software. He will serve as administrative liaison with the Project Coordinators from the collaborating institutions, and will be the project's representative to the NDLF Architecture Group. The Project Manager will report to him, and he will participate in efforts to disseminate the project's results.

      3. Consultant: Dr. Howard Besser will provide expertise on imaging and metadata. He will assist in the development of metadata proposals, serve as liaison with related projects elsewhere, and assist with dissemination.

      4. Project Manager: A Project Manager, reporting to the Project Director, will be hired to supervise implementation of the project. S/he will assist in the design of the prototype architecture with the assistance of the Project Director. S/he will coordinate operations among the participating institutions, develop the training curriculum and carry out the training of the Project Coordinators of the collaborating institutions. Following training, s/he will visit each of the collaborators to provide follow-up support and consultation. The project manager will serve as an SGML technical consultant to the project and s/he will also coordinate and supervise Electronic Text Unit staff mounting and publishing the marked-up texts for Berkeley. S/he will monitor the quality of all project finding aids. S/he will participate in efforts to disseminate the results of the project.

      5. Archival Control Coordinator: Mr. Jack von Euw, Head of Technical Services in The Bancroft Library, will oversee Berkeley’s selection of cataloging records, finding aids, and images. He will consult with collaborating institutions concerning finding aids and the integration of related collections in the prototype. He will participate in efforts to disseminate the results of the project.

      6. Curator of the Bancroft Library Western Americana Collections: Dr. Bonnie Hardwick will assist in the creation of the project's materials selection plan and will participate in the process of selecting individual images for digital reproduction. She will serve as liaison with curators in the participating libraries, and with scholars. She will also provide assistance in the authoring of finding aids for some of the collections used in the project. She will participate in the dissemination of the project's results.

      7. Berkeley Library Imaging Coordinator: Mr. Barclay W. Ogden, Head of the Conservation Department at Berkeley, will consult with collaborating institutions on imaging and procedures for photo and digital reproduction of the materials used in the project. He will supervise Berkeley's imaging process, resolving any questions and ensuring that quality and production goals are met. He will assist in the development of the administrative metadata recommendations and participate in dissemination of the project results.

      8. Evaluation Coordinator: Dr. Geri Gay, Director of Cornell University's Interactive Multimedia Group (IMG), will develop and carry out the evaluation of the prototype system. She will develop the evaluation guidelines and questionnaires. She will work with the collaborators to make sure that these reflect their concerns about performance of the prototype. She will supervise the implementation of the evaluation procedures, including distributing evaluation materials. She will compile and analyze the evaluation data. She will write the evaluation report for inclusion in the project's final report. She will participate in the dissemination of the project's results.

      9. Participating Institutions Project Coordinators: For resumes of project directors and other key staff at the participating institutions see Appendix M.

      10. Technical Staff: The UC Berkeley System Administrator (Programmer Analyst III) will install the Handle System, a URN management and resolver developed by CNRI. The Systems Administrator will install and support Unix based central software (OCLC's SiteSearch, DynaText, DynaWeb, ArborText, etc.), configure the Unix kernal to support new hardware (e.g., magnetic disk) and will tune and monitor the systems performance. The Programmer Analyst III will use the Handle System to add URN/URL support to the DynaText web browser and the image viewer.

        The Production Controller (Programmer Analyst I) will be responsible for receiving (via FTP) the metadata from all participants and loading/indexing it into DynaText on a regular, frequent schedule. S/he will provide technical support to project participants.

        An SGML DTD specialist will be hired to develop structural metadata proposals. The incumbent will coordinate this work with similar work being done elsewhere, for example at Michigan, through TEI, etc.

      11. Image Selection Staff: A full-time Library Assistant IV will serve as work leader for Berkeley's selection of images and its assembly of cataloging records and finding aids; two Library Assistant IIIs will assist the work leader for Berkeley's selection of images; four student assistants will assist the library assistants with the selection of images and the assembly of cataloging records and finding aids. They will also assist with other support functions connected with the imaging workflow and finding aid conversion.

      12. NDLF Planning Task Force: The NDLF Planning Task Force will receive regular reports of progress, and will review project methodologies. It will disseminate information to NDLF libraries that are not participating in this particular project, and will ensure that the MoAII Testbed Project is effectively coordinated with other NDLF projects. The NDLF Planning Task Force will review proposals for NDLF funding, and make recommendations about these requests to the Board of the Council on Library and Information Resources. The Architecture subgroup of the NDLF Planning Task Force will play a key role in evaluating systems architecture proposals.

    7. Digital Technology Plan:

      1. Conversion Overview: Catalog records will be created (through a local system or a bibliographic utility) using the the USMARC standard, sent via File Transfer Protocol (FTP) to Berkeley and loaded into the Union Catalog for the MoAII Testbed Project. The union catalog will be based on OCLC’s SiteSearch software, which has been contributed for this project. Finding aids will be converted to the EAD standard using SGML authoring tools (ArborText's Adept Editor, SoftQuad's Author/Editor, etc.), sent via FTP to Berkeley and loaded into the finding aids union catalog for the Making of America II Testbed Project. This union catalog will be based on EBTs (Electronic Book Technology’s) DynaText Software. Images and associated metadata will be created at the participating institutions and loaded on their local servers.

      2. Access Overview: Berkeley will install and run a URN naming and resolution server developed by CNRI: the Handle System. Please note that in the Handle System URN’s are called handles. USMARC catalog records will contain the handle (i.e., URN) of the related finding aid. Similarly, finding aids will contain the handles (i.e. URNs) of related images. The following represents a typical access scenario (for samples of the access system, see Appendix A):

        1. Search the Making of America MARC Union Catalog: The user enters a union catalog search on a Web browser. The participating institution’s Z39.50 compliant web server (e.g., Berkeley uses OCLC’s WebZ software) sends the search via Z39.50 to the union catalog (i.e., SiteSearch), retrieves the results and then displays catalog records in the Web Browser.

        2. Select the Link to the Finding Aid from the Catalog Record: The handle (i.e., URN) for the related finding aid displays as a highlighted field in the USMARC record. The user clicks on the handle to see the finding aid. The handle is sent to the Berkeley local handle server and resolved into a URL for the finding aid.

        3. Display the Finding Aid: The finding aid URL is sent to DynaWeb which retrieves and converts the SGML-based document into HTML for display on the Web browser.

        4. Select a Link to an Image from the Finding Aid: The handles (i.e., URNs) for the related images display as highlighted fields in the finding aid. The user clicks on a handle to see an image. The handle is sent to the Berkeley handle server and resolved into a URL for the image.

        5. Display the Image: The image URL will be sent to the appropriate participating Web server which will retrieve and display the image on the browser running the appropriate plug-in or helper application.

      3. Hardware and Software Summary:

        1. SGML Software: SGML Authoring Tools (ArborText's Adept Editor, SoftQuad's Author/Editor, etc.); SGML Browsers ( Electronic Book Technology's DynaText browser and DynaWeb, a SGML to HTML real-time converter that allows viewing SGML documents on the Web); SGML Database Manager ( Electronic Book Technology's DynaText Database); appropriate image viewers (XV for UNIX, etc.).

        2. Union Catalog Software: The union catalog MARC records will be loaded onto the Z39.50-compliant OCLC SiteSearch System running at Berkeley.

        3. Web Software: Users will access all data via Web browsers (e.g., Netscape Navigator, Microsoft Explorer, etc.). Each campus will run a Web Server (e.g., from Netscape, NCSA, etc.) to provide access to the decentralized images.

        4. URN Creation/Management/Resolution Software: The project will install and run the Handle System developed by CNRI. Besides the global handle server maintained by CNRI, the Berkeley campus will also run a local handle system that will be the home handle service for this project. Handles generated by each naming authority will be stored on this system and requests to resolve the handles into URLs will be resolved by the Berkeley handle server.

        5. Central Hardware: The SiteSearch MARC union catalog and the DynaText SGML finding aids database will be loaded at Berkeley on the Library’s Sun Microsystems SPARCcenter 2000E (a gift of Sun Microsystems). This server and all data are backed-up on a regular basis.

        6. Decentralized Hardware - Image Storage: Images will be decentralized to servers at participating institutions. Images will be named with URNs (handles) and accessed via Web server software.

  5. STAFF QUALIFICATIONS (for additional information, see Appendix M):

    1. Principal Investigator: Dr. Peter Lyman is University Librarian of the University of California, Berkeley. Before coming to Berkeley, he served as University Librarian and Dean at the University of Southern California. He has written and consulted widely on the library of the future, particularly on issues related to the use of digital and networked information to create uniquely appropriate and powerful tools and collections for research, teaching, and learning. Dr. Lyman will spend approximately two hours per week on the project.

    2. Project Director: Bernard J. Hurley, The Berkeley Library’s Chief Scientist, has served as the Director for Access Services during the past two years and was formerly Director for Library Systems, beginning in 1981. As Chief Scientist, Mr. Hurley is responsible for strategic technical planning and systems architecture for the Berkeley Library. He is The Library’s chief liaison with technology partners in the corporate and research sectors. He has responsibility for program planning and policy formulation as a member of the Library’s Administrative Group. Mr. Hurley has been working in the field of library automation for the last seventeen years. While at Berkeley he has played a central role in developing the GLADIS System, Berkeley's online catalog, catalog maintenance, authority control, and circulation system, and its access to the Berkeley Campus Information Network. He has served as the Project Director for Berkeley's U.S. Department of Education funded Finding Aid Project, NEH funded California Heritage Digital Image Access Project, and NEH funded American Heritage Virtual Digital Archive Project . Mr.. Hurley will spend four hours per week on the project.

    3. Technical Consultant: Dr. Howard Besser, Professor in the School of Information Management and Systems at Berkeley, is an internationally-recognized expert in digital imaging metadata standards and systems and has consulted, written, and lectured extensively on these issues. He will spent approximately 10 hours per week on the project.

    4. Project Manager: A Project manager will be hired specifically for this project. Qualifications will include supervisory and management experience, knowledge of SGML and DTD development, basic understanding of scanning, metadata, and digital library architectures.

    5. Qualifications of Other Key Personnel:

      1. Archival Control Coordinator: Jack von Euw, Head of The Bancroft Library Technical Services, is responsible for the reorganization and management of the unified technical services. His previous assignment included managing The Bancroft Library Manuscripts Retrospective Conversion Project and the processing phase of the Preservation and Improved Access of the C. Hart Merriam Papers Project. He was the Archival Coordinator for the Berkeley Finding Aid Project, the California Heritage Project, and the American Heritage Project. Mr. von Euw will spend four hours per week on the project.

      2. Curator of the Bancroft Library Western Americana Collections: Dr. Bonnie Hardwick’s doctorate focuses on Western American history. She has an extensive background in Special Collections, manuscript curatorship and processing, archival librarianship. She will spend two hours per week on the project.

      3. Berkeley Library Imaging Coordinator: Barclay W. Ogden, Head of the Conservation Department, is a specialist in the design and administration of library preservation programs. He founded the Conservation Department at Berkeley in 1980, which has grown under his direction to become one of the five largest such programs in U.S. research libraries. Mr. Ogden serves as the Director of the University of California Preservation Program, a cooperative effort of the nine University of California campuses, was a leader in California's effort to develop a statewide preservation plan, under contract to the California State Library. He is Imaging Coordinator for the California Heritage Project. Mr. Ogden will spend two hours per week on the project.

      4. Evaluation Coordinator: Dr. Geri Gay directs The Interactive Multimedia Group (IMG) at Cornell University. The IMG is an interdisciplinary research and design team created to understand and improve the expanding role of computers in communicating, learning, working, and playing. IMG studies how humans interact with computers, and how technology can mediate communication.

  6. EVALUATION:

    The evaluation of the Making of America II Testbed Project will be led by Cornell University’s Interactive Media Group (IMG), see Appendix K. The plan will focus on four major issues: 1) the selection of classes of digital objects to be standardized and the elements used to create structural and administrative metadata for each class; 2) the effectiveness and ease of use of the MoAII Testbed for discovering and navigating digital objects; 3) the feasibility of using the MoA II Testbed architecture in a national digital library and; 4) the economic feasibility of the MoAII Testbed project approach.

    Issue 1: The evaluation plan calls for the development of studies to determine whether the project selected the proper classes of digital objects to be standardized. In addition, it will evaluate the elements selected to be encoded as structural and administrative metadata for each class. The goal is to determine if this selection of classes and metadata elements provided value to scholars and students who used the digital objects within the MoAII Testbed.

    Issue 2: The project will also evaluate how effective users found the MoAII Testbed as a research and learning tool, including issues such as ease of use. Evaluation criteria will include searching, navigation, pointing, linking, mapping issues, and control and display capabilities for the different classes of digital objects. User response to how each class of digital object could be discovered, displayed and navigated will be measured. Beyond that, the evaluation will attempt to measure not only if the MoAII Testbed architecture improved search capabilities and navigation options over their print counterparts, but whether they encourage new ways of searching and understanding the materials represented.

    Issue 3: There will also be a technical evaluation of the MoAII Testbed architecture. The design of this prototype system will be examined, including performance issues associated with the delivery of digital surrogates of primary sources over the network and the use of Z39.50 to connect MARC bibliographic and SGML textual databases. The evaluation will also study the ability of the testbed architecture to scale to a large number of users and a large number of digital objects spread across distributed repositories. In particular, the different digital object naming schemes considered in the project will be evaluated for use in a true national digital library architecture.

    Issue 4: The project will evaluate whether the proposed MoAII model offers a cost effective method of providing access and control of digitized primary source material available over the Internet. Because this is a research and demonstration project, the amount of usable data will be limited. Nevertheless, economic and workflow data from all collaborators will be collected and analyzed. The project will monitor and evaluate the costs of training and all the different methods used in creating digital objects that act as surrogates for primary source materials, including the creation and encoding of metadata. The final evaluation report will make recommendations on the most economical approaches to the creation of a distributed repository for archival digital objects.

    For the proposed project, the IMG will employ classroom experiments, online questionnaires, and various qualitative methods to assess the feasibility and utility of network-based finding aids. The evaluation plan will call for data collection, interpretation, and reporting at three intervals: prior to implementation; at mid-project; and just before project closure. IMG's participation will include evaluation planning and administration, development of qualitative and quantitative measures, data gathering, analysis, and reporting.

  7. DISSEMINATION:

    1. Dissemination Plan: Project results will be made widely available.

      1. Detailed reports will be submitted to the NEH, CLIR Board, the Policy Committee of the NDLF, the National Historical Publications and Records Commission (NHPRC), the National Archives and Records Administration, the SAA Committee on Archival Information Exchange, OCLC, RLG, The Cooperative Interchange of Museum Information (CIMI), Association for Information and Image Management (AIIM), and the administrations of all of the project's collaborators. The reports will be posted on the Internet for public access and review.

      2. The Digital Library created during the MoAII Testbed Project--its catalog records, finding aids, and digital images--will be disseminated throughout the world on the Internet. It will be demonstrated at exhibitions and presentations to professional conferences and meetings, e.g., the CNI Task Force meeting, the ALA Annual Meeting, the Rare Books and Manuscripts Section of the Association of College and Research Libraries (ACRL RBMS pre-conference), the Society of American Archivists Annual Meeting (SAA), and the American Society of Information Science (ASIS) Annual Meeting. It will also be demonstrated at scholarly meetings.

      3. Papers will be contributed to professional conferences and meetings (e.g.,ALA, ACRL RBMS pre-conference, SAA, ASIS, Western History Association Annual Meeting, Organization of American Historians Annual Meeting, and the American Historical Association, Pacific Coast Branch Annual Meeting).

      4. Papers will be submitted to manuscripts and archives journals (e.g., The American Archivist); automation journals (e.g., LITA's Information Technology and Libraries, Educom Review, and Academic and Library Computing, DLIB); and history journals (e.g., Western Historical Quarterly and Pacific Historical Review).

      5. Draft practices, procedures, recommendations, etc. will be made widely available throughout the project, using the MoAII Testbed Project website.

      6. An invitational conference, following the conclusion of the project, will be funded by NDLF. The purpose of this conference will be to evaluate the feasibility and desirability of building on the prototype developed in the project; to refine the set of "best practices" for the creation digital object archives and libraries of distributed resources; to recommend next steps in the development of standards and policies for such projects; and to formulate additional research and demonstration questions to be pursued.

    2. Continuation of Work After the Termination of NEH Funding: The project's participating institutions will have a considerable stake in making sure the research in this project is successfully carried out and that its results are followed up with further research and development. The members of the NDLF will use the project results to stimulate the national process that will lead to the formal adoption of metadata standards for digital library objects. More ambitiously, NDLF will use project results to lead to the creation of a national architecture for the digital library which will allow for access to decentralized collections of digital surrogates on the network. Moreover, the digital library created through this project will serve as the foundation for a production effort to create the Making of America Digital Library. We expect that this project will identify of a future research agenda for issues such as the following:

      1. archiving, refreshing, and migration to ensure the longevity of the digital library;

      2. development of scholarly tools to enhance the usefulness of the digital library;

      3. development of digital library architectures that support storage and delivery of complex digital objects, including methods for their use;

      4. establishment of practices for the management of intellectual property, including authentication of objects and users, authorization, and internet commerce;

      5. Investigations into economic models that can sustain and expand a distributed national digital library.

        During the course of this project, each participant will develop expertise in the creation of digital libraries, and will establish a repository for the digital objects created; each institution has committed to keeping the images stored on this repository accessible and to maintain the links among MARC records, EADs, and images. Berkeley is committed to maintaining the union catalog and EAD database indefinitely, at least until such time as there is either an independent entity through which such resources are archived and made internationally accessible, or until it is technically feasible effectively to implement the digital library using distributed metadata records.


Footnotes:

1.

The following are suggested administrative metadata categories for archival images based on the Digital Scriptorium Project, a collaborative imaging effort between Berkeley and Columbia. These may serve as a starting point for the MoAII Testbed Project . Included are those elements that are necessary both for initial image processing and for long-term administrative and scholarly use; some are characteristics of the initial image capture, and some may be integrated in the capture system or part of corol lary procedures. Those that should be retained for long-term use with the archival version of the image are marked with an asterisk. A proposed starting point for administrative metadata for archival image files includes:

*Date of Capture

Capture System:
*Type, *Brand, *Model, *Bit depth, *Color information system, *File format, Default setting/adjusted

*Image Authentication System Used (such as electronic watermarking)
Then, depending on the type of capture system (Below, possible administrative metadata relating to three different capture methods are described):

Digital Camera:
Batch #, *Pixel dimensions, *dpi, F-stop, *Electronic "shutter" speed, *Filter, Illumination level, *Glass used Y or N

Film/Photo CD:
Film roll #, Film frame #, Roll index #, F-stop, Shutter speed, *Filter, Film type, Illumination level, *Glass used Y or N. *Pixel dimensions, *dpi


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix A

 

Sample Discovery and Navigation System:

 

Collection-level Records

Encoded Archival Descriptions

Digital Images


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix B

 

The National Digital Library Federation


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix C

 

The Making of America II Project, Background


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix D

 

Subcontractor proposals and budgets


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix E

 

The Encoded Archival Description


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix F

 

Architecture for Information in Digital Libraries


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix G

 

Previous Research:

The Berkeley Finding Aid Project

California Heritage Project

American Heritage Virtual Digital Archive Project

UCEAD Project


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix H

 

Related Work:

 

Making of America

J-STOR

American Memory

Dublin Core


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix I

 

Naming:

 

IETF URN's

OCLC PURL

CNRI Handle

OMG CORBA

Microsoft DCOM

RLG Arches


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix J

 

Structural Metadata:

E-Bind


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix K

 

The Interactive Media Group,

Cornell University


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix L

 

History of Grants


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix M

 

Resumes of Project Participants

Advisory Board


THE MAKING OF AMERICA II TESTBED PROJECT

 

Appendix N

 

Suggested Evaluators


-=> END <=-