PROJECT ABSTRACT
Applicant:
The Library
University of California
Berkeley, California 94720
Title of Project:
THE MAKING OF AMERICA II TESTBED PROJECT
Principal Investigator:
Peter Lyman
University Librarian
245 Library
University of California
Berkeley, CA 94720
(510) 642-3773 | | | |
Project Director:
Bernard J. Hurley
Library Chief Scientist
245 Library
University of California
Berkeley, CA 94720
(510) 642-3773 |
Funding Requested: $350,000
Project Period: May 1, 1998 - April 30, 1999
Collaborating Institutions:
Project Participants:
University of California, Berkeley Library
Cornell University Library
New York Public Library
Penn State University Library
Stanford University Library
|
National Organizations:
National Digital Library Federation
The Corporation for National Research Initiatives
The Council on Library and Information Resources
American Council of Learned Societies
OCLC Online Computer Library Center, Inc.
Research Libraries Group
|
SUMMARY: THE MAKING OF AMERICA II TESTBED PROJECT
The primary resources in the nation’s research libraries, which constitute an invaluable foundation for research, are increasingly viewed as resources for teaching and learning among students in elementary school through college. The geographic di
stribution of library collections and the fragility and uniqueness of many primary source materials have always presented barriers to access. Heretofore, only those scholars able to travel to individual repositories have been able to utilize all of the r
esources relevant to their research. Digital technologies offer opportunities to overcome these problems of geographic distribution and fragility of primary sources by making digital surrogates of the materials available on the Internet where they can be
used by scholars, students, and the general public.
The Making Of America II Testbed Project continues and extends research and demonstration projects that have begun to develop best practices for the encoding of intellectual, structural, and administrative data about primary resources housed in researc
h libraries. It builds most directly on development of the Encoded Archival Description (EAD), now being maintained jointly by the Library of Congress and The Society of American Archivists; it also extends the Making of America Project carried out by Co
rnell and Michigan. While the EAD provides a community standard for encoding finding aids, it does not provide any guidance for creating or encoding the digitized surrogates of the primary source materials pointed to by these finding aids. Therefore, the
next step in the process of developing seamless access through EAD-encoded finding aids is the development of related community practices for creating and encoding the digitized versions of the primary sources. The research objectives of this proposal i
nclude:
- Identifying the classes of primary source materials that require community standards (e.g., digital books, manuscripts, diaries, pamphlets, etc.);
- Defining the structural metadata for each class that is used to display and navigate an object from the class (e.g. a digital book has a table of contents, chapters, pages, an index, etc.);
- Developing common practices for the creation of digital surrogates for primary resources, including the capture of administrative metadata that describes how the object was created, who may use it (i.e., rights management), etc.;
- Defining practices for encapsulating the administrative and structural metadata along with the digitized version of the primary resource to create an archival digital library object;
- Investigating naming schemes that will uniquely identify each digital library object so it can be accessed from its home repository located anywhere on the Internet and;
- Creating a prototype digital library that will allow the community to experiment with and evaluate different practices for creating and naming digital library objects.
The umbrella Making of America Project will create a digital library of primary source materials relating to the Making of America, focusing on the Gilded Age, 1876-1900. The Making of America II Testbed Project collection will be "Transportatio
n, 1869-1900", particularly the development of the railroads and their relationship to the cultural, economic, and political development of the country. Creation of a coherent corpus of material on a focussed theme will enable testing of end-user accepta
nce of the methods developed during the project.
THE MAKING OF AMERICA II TESTBED PROJECT
TABLE OF CONTENTS
- Significance of the Proposed Project
- Overview:
- Need
- National Priority
- Value for Research, Education, and Public Projects in the Humanities
- Nature, Size, Intellectual Content, of Collection
- Research Objective: Community Standards for the Creation and Use of
Digital Library Materials
- Research Objective: Naming Conventions and Systems
- History of Prior Research
- Overview
- The Berkeley Finding Aids Project
- The California Heritage Digital Image Access Project
- The American Heritage Virtual Digital Archive Project
- The UC EAD Project
- Other Related Work
- Methodology and Standards
- Methodology Overview:
- Selecting Collections for the Project
- Accessing the Digital Library: The Users' Perspective
- Metadata Research
- Defining Structural Metadata for MOA II Digital Objects
- Defining Administrative Metadata for Images
- Research on Naming Conventions and Systems
- The Need for Naming Conventions and Systems
- Naming Convention Research Methodology
- Standards and Best Practices
- Plan of Work
- Overview
- Planning Phase
- Research and Production Phase
- Dissemination Phase
- Project Time Table
- Management Plan
- Digital Technology Plan
- Staff Qualifications
- Principal Investigator
- Project Director
- Technical Consultant
- Project Manager
- Other Key Personnel
- Evaluation
- Dissemination of Research
- Dissemination Plan
- Continuation of Work After the Termination of NEH Funding
- Project Budget
- Appendices
- Sample Discovery And Navigation System:
- Collection-level Records
- Encoded Archival Descriptions
- Digital Images
- The National Digital Library Federation
- The Making of America II Project, Background
- Subcontractor Proposals and Budgets
- The Encoded Archival Description
- Architecture for Information in Digital Libraries
- Previous Research:
- The Berkeley Finding Aid Project
- California Heritage Project
- American Heritage Virtual Digital Archive Project
- UCEAD
- Related Work:
- Making of America
- J-STOR
- American Memory
- Dublin Core
- Naming and Architecture:
- IETF URN
- OCLC PURL
- CNRI Handle
- OMG CORBA
- Microsoft DCOM
- RLG Arches
- Structural Metadata: E-Bind
- The Interactive Media Group, Cornell University
- History of Grants
- Resumes of Project Participants; Advisory Board
- Suggested Evaluators
|
1
1
1
2
2
2
3
4
4
4
4
4
4
5
5
5
5
6
6
7
7
9
10
10
11
12
12
12
13
14
16
17
18
20
22
22
22
22
22
22
23
24
24
25
26
33
35
37
45
66
76
91
139
148
171
171
173
175
177
179
179
186
189
190
193
193
198
206
218
220
227
235
262
263
264
348
|
- SIGNIFICANCE OF THE PROPOSED PROJECT
- Overview:
The primary resources in the nation's
research libraries, which constitute an invaluable foundation for
research, are increasingly important as resources for teaching and
learning among students in elementary school through college.
For decades, libraries have understood the importance of sharing
information. The most visible examples of this philosophy are the sharing
of collections via interlibrary loan and of catalog records that provide
access to library collections. The found ation of sharing has been built
on the creation of community practices and national standards that define
the information and processes required to share library materials. The
prime example is the development of the USMARC national standard, which
defines a library catalog record. Once the catalog record was
standardized, it became possible to create online union catalogs which now
include records from libraries located all over the world. These union
catalogs then became the primary tool for implemen ting modern
interlibrary loan programs, as a library user could find out which
libraries owned a particular item and then request it through Interlibrary
Loan.
However, because of their uniqueness or rarity, value, and fragility,
primary resources housed in special collections throughout the world have
never been lent in the ways that published materials have been. The
geographic distribution of these collections has always been a barrier to
research, and therefore only those scholars able to travel to the
individual repositories have been able to utilize all the resources
relevant to their research; primary resources have typically been
unavailable to young er students and members of the general public.
The Making of America II (MoAII) Testbed Project is significant because it will continue the applied research necessary to develop best practices and standards that will guide the creation of large-scale digital libraries which library users of all kin
ds, from anywhere there is a network connection, can easily use for teaching, learning, or research.
- Need:
The emerging national digital library has the
potential to elevate resource sharing to a new level, as it will be
possible for users anywhere to find and use entire books, journal
articles, and primary source materials directly over t he Internet.
Through digital technology, the problems of geographic distribution of
primary sources can be overcome so that these resources can be more widely
exploited for humanistic research, teaching, learning and public
dissemination. However, this potential will be realized only if the
library community agrees to new practices and standards that will allow
digital library materials to be easily located and used. Without
community standards, each library will store its electronic content in a
propr ietary format in proprietary computer systems.
The result will be that users will have to search each of these
systems, located at many institutions, to find and use the digitized
materials they require. In short, we will have created a series of
proprietary collections and missed the opportu nity to create a national
digital library.
To create a national digital library, it will be necessary to
define: a) community standards for the creation and use of digital
library materials and; b) a national software architecture that
allows digital materials to be shared easily over the network. It is
possible to pursue both these goals concurrently.
The National Digital Library Federation (NDLF: see Appendix B), a
program of the Council on Library and Information Resources, was created
to help promote opportunities and address problems inherent in the
creation of digital libraries. Five of its sp onsors are joining in the
present proposal to begin to address these two issues. They will work
together with other NDLF sponsors, the Research Libraries Group (RLG), the
Corporation for National Research Initiatives (CNRI), OCLC, five Council
on Library and Information Resources/American Council of Learned Societies
(CLIR/ACLS) Taskforces to Define Research Requirements of Formats of
Information, The Library of Congress and others to develop best practices
that will be required to implement a national digital library system.
- National priority:
Many libraries have already begun to create
digital library resources, but few have systematically addressed the
issues of best practices and digital library architecture. The Library of
Congress has implemented its National Digital Library Program, The Getty
Information Institute is constructing community resource libraries,
commercial entities are creating digital products, and individual research
libraries are experimenting with digital technologies to solve various
problems and provide services. The longer libraries wait to begin to
develop community agreements about best practices and system architecture,
the more difficult it will be to create distributed digital libraries that
appear as an integrated whole to the user. Without seamless, integrated
access, scholarly productivity will suffer, and the availability of useful
resources obscured.
The increasing national commitment to K-12 education combined with the
current national emphasis on strengthening distance education through
improved uses of educational technology also motivate the participants to
begin work on this project . K-12 te aching and learning, as well as
distance education require that core information resources be made
available in digital form so that they can be used in the classroom and
from remote sites. The Library of Congress has found that its American
Memory project can be effectively used in many different ways by teachers
and students alike. The digital collection constituting the MoAII Testbed
Project, like that in American Memory, will be available to schoolchildren
and their teachers as well as scholars.
- Value for Research, Education, and Public Projects in the
Humanities:
The larger Making of America Project (see Appendix C for background information) will create a collection of vital importance for the study of American history, including
its culture, peoples, technology, political and social movements, and arts. The digital collection will be created from among the resources housed in libraries throughout the country, beginning with those of the five participants in this particular proj
ect. The collection will complement other digital collections--for example, the Library of Congress’s American Memory project, RLG’s Studies in Scarlet, the American Heritage Project, and the California Heritage Project. By creating a critical mass of m
aterials in a focused subject area, the participants will ensure that the project’s collection can serve as a testbed for evaluating the potential of the digital library to serve the needs of scholarly research as well as K-12 teaching and learning. Huma
nities resources are particularly useful as a testbed for the digital library because they exist in many different formats (for example, print, manuscript, pictorial, multi-media, artifactual) and are widely used by scholars, independent researchers, scho
ol children, and the general public.
- Nature, Size, Intellectual Content of Collection:
Within the
general framework of the MoAII Project, the MoAII Testbed Project proposed
here will focus on the specific subject area of "Transportation in
the United States, 1869-1900", with a particular emphasis on
development
of the railroads. This topic was chosen for the rich range of scholarly
explorations it offers. From 1869-1900, the United States was knit
together through forms of transportation of all kinds, but particularly t
he railroads. Railroads formed the basis for distribution of industrial
products, opened the West to settlement, served as a foundation for the
economy of the nation, and fostered the inclusion of new ethnic groups
into the American populace. The five participating libraries all have
important collections in this area, and these dispersed collections will
represent a stronger corpus of material if brought together into a single
collection, as is made possible through digital technology. The digital
lib rary collection to be created through this project will encompass many
different forms of documentation (for a full description, see individual
institutions’ proposals in Appendix D), including printed materials,
broadsides, pictorial collections, manuscripts, diaries, letters, etc.
More than 15 individual collections, including at least 25,000 digital
surrogates of documents from those collections, will be represented in the
database by the end of the project.
Moreover, because preliminary
community agreements will have been reached about the description of the
various categories of objects, administrative metadata for digital images,
and naming conventions, the collection will have greater intellectual inte
grity than it would if the institutions were acting independently, and the
digital library can grow over time, expanding its subject coverage; adding
important collections; and incorporating a wide variety of contributors
including libraries, museums, publishers, and others.
-
Research Objective: Community Standards for the Creation and Use of
Digital Library Materials: If digital library materials are to be
easily shared across the network, they must be standardized to ensure they
can be located, navigated, and displayed by different computer systems.
NDLF has agreed that the first step in this process is to define the
different classes of digital library materials that may exist. For
example, a separate class could be defined for a digital surrogate of a
book, a manuscript, a music score, etc. Once each class is identified it
is necessary to define the elements that describe each digital
object in that class. Three types of description, or metadata,
are necessary:
- Intellectual Metadata includes the
elements that describe the content of each digital object. For example, a
digital book object would be described by its catalog record and include
elements such as author, title, publisher, etc.; the Encod ed Archival
Description (EAD) provides intellectual metadata for collections of
objects (See Appendix E). Intellectual metadata is used to
discover materials of interest. The present project will not directly
address investigations into questions of intellectual metadata, but will
rely on existing standards and practices.
- Structural Metadata
is the information that describes the
internal organization of each digital object in a particular class of
materials so that the user can effectively and efficiently navigate
through the object. In the example of a digital book, the structure could
include a title page, preface, table of contents, chapters, pages, index,
colophon, etc. Structural metadata is primarily used by computers to
navigate that object for a user. For example, turn to the next page, jump
to chapter three, etc.
- Administrative Metadata
is used to record information about
the digital object including technical specifications related to image
capture, enhancements, or other derivative actions on the digital object;
and other characteristics that must remain with the object to ensure its
long term retention and use and protect intellectual property rights.
- Research Objective: Naming Conventions and Systems:
In
addition to defining community standards for the creation and use of
digital library materials, NDLF intends to work with others in the
community to address system architecture issues that must be resolved if a
national digital library is to be realized. Given the potential number
and size of digital library objects which will be created, the most likely
architecture is one that distributes the objects across the network in a
number of separate repositories (i.e., databases). A key problem in this
architecture is naming each object so it can be found once a user decides
s/he wants to look at it. The MoAII Testbed Project will evaluate
proposed naming standards and systems that will address this critical
component of a national digital library architecture.
HISTORY OF PRIOR RESEARCH
- Overview: Berkeley, the lead institution for this
project, has conducted a number of successful research projects that have
explored metadata and technical issues relating to the creation, access,
navigation, and use of digital facsimiles created from primary source
materials. These prior projects have included broad community involvement
with the objective of establishing agreement on a set of best practices
for the creation of digital libraries. The present project builds on this
prior research, as well as that of other entities. A description of the
more important prior research follows.
- The Berkeley Finding Aid Project
(Appendix G) was a
collaborative endeavor to test the feasibility and desirability of
developing an encoding standard for archive, museum, and library finding
aids (documents used to describe, control, and provide access to
collections of related materials). It involved two interrelated
activities: 1) the design and creation of a prototype encoding standard
for finding aids, and 2) building a prototype database of finding aids.
The project was supported by Higher Education Act Title IIA funds, and
software grants from Electronic Book Technologies and ArborText. A large
selection of the finding aids is available on the Berkeley Digital Library
SunSITE at http://sunsite.berkeley.edu/FindingAids/.
Subsequent
funding by the Commission on Preservation and Access and the Bentley
Historical Library resulted in adoption of EAD by the Society of American
Archivists, and its maintenance as a standard by the Library of
Congress.
- The California Heritage Digital Image Access Project
(Appendix
G) was funded by the NEH to demonstrate that USMARC collection-level cataloging records and standardized, electronic versions of archival finding aids, used together in the netwo
rk environment, can provide access to and control of digitized images. The project created and tested a prototype digital image access system available on the Internet based upon SGML finding aid technology developed in the Berkeley Finding Aid Project.
(See http://sunsite.berkeley.edu/CalHeritage/).
- The American Heritage Virtual Archive Project
(Appendix G)
is an NEH-funded project, to develop a demonstration union database
providing online access to distributed digital library resources. The
four major objectives include: 1) to develop protocols that enable
physically dispersed collections to appear as one; 2) to develop
mechanisms to navigate related collections; 3) to develop prototype
standards, policies and procedures for remote collaborative creation and
maintenance of a union database of catalog records and finding aids that
lead to digital images of primary source materials; and 4) to investigate,
develop, and test mechanisms to ensure seamless access and navigation
through the catalog and finding aids to images housed on remote servers.
(See http://sunsite.berkeley.edu/amher/).
- The UCEAD Project
(Appendix G) is testing production systems
for creating EAD records for Libraries within all nine campuses of the
University of California; the union database will form the foundation for
the development of a full-scale digital archive for the University of
California System (See
http://sunsite.berkeley.edu/FindingAids/uc-ead/).
- Other Related Work
(Appendices H & I): There is a
large body of other related work that will inform the Making of America II
Testbed Project. Some of the more important include: The Making of
America Project (Michigan and Cornell ), Michigan's structural metadata
work (including J-STOR),OCLC/CNI work on the Dublin Core metadata set and
the Library of Congress’s American Memory Project. Related work on
digital library architectures includes LC/CNRI investigations into digital
library architectures and object naming conventions (i.e., the Handle
System), OCLC's PURL system for naming digital library objects, the OMG
CORBA system, Microsoft’s DCOM, and RLG's Arches system. Project
investigators will closely watch the NSF/DLI projects for applicable
solutions, as well as the Larson/Watry research into cross-domain
searching. Work of the Text Encoding Initiative, CIMI, Getty Information
Institute and others will be regularly monitored. Of particular
importance will be the work of the CLIR/ACLS scholarly task groups.
METHODOLOGY AND STANDARDS
- Methodology Overview: The MoAII Testbed
Project’s methodology is significant for three reasons. First, it
continues applied research necessary to form the foundation for the
development of large-scale, usable distributed digital libraries by
creating a testbed where digital object metadata and naming conventions
and systems can be evaluated. Second, the methodology is designed to
disseminate the results of this project’s research into the library
community as a basis for the development of agreements about community
standards for the creation of those digital libraries. The MoAII Testbed
Project initiates this process by investigating best practices for naming
conventions and for certain classes of administrative and structural
metadata, and then provides the testbed where these definitions can be
evaluated. Finally, it initiates the creation of a focused digital
library collection, capable of being augmented through the years by many
additional institutions. The methodology for this project:
- Involves the library community
by bringing together the
participants, scholars, NDLF sponsors, libraries, experts and national
organizations to understand and agree upon metadata and architecture
issues.
- Creates a testbed
that links a union catalog of MARC
collection level records to a union catalog of EADs, then to distributed
repositories of digital surrogates of important primary source
materials.
- Evaluates the research
by using the testbed to understand
architecture issues related to the naming of digital objects and
definition of best practices for metadata. Equally important, the testbed
will allow humanities scholars and students to evaluate the usefulness of
these approaches and to advise the library community on research
directions.
- Disseminates the results
not only through the digital
library it creates, but also through the project website,
widely-distributed drafts of proposed practices, scholarly papers, and a
national invitational conference that will summarize the research of this
project and set future goals, and research agendas.
The methodology to be used in this project builds on previous work and
is intended to serve as an essential component of the National Digital
Library Federation's exploration of digital library systems that will
deliver information seamlessly to users, regardless of where it resides on
the network. The present project is expected to lead to future research:
for example identifying and defining additional classes of digital library
objects; investigating ways of providing multiple views of the digital
content (for example, enabling school teachers to attach interpretative or
curricular materials to scholarly resources); archiving and migration, and
further refinement of a technical architecture for digital information
object repositories.
- Selecting Collections for the Project:
In order to create a
testbed and prototype digital library, the participants have chosen to
digitize at least 25,000 images from important collections relating to the
topic of "Transportation in A merica, 1869-1900", with a focus on
development of the railroads. Participants expect that the virtual
collection resulting from the project will have two main values. First,
it will be available to scholars, undergraduates and K-12 students as
example s of primary historical documents which can be used for a variety
of research and classroom purposes; and second, it will form the kernel of
a broader MoAII collection, to which additional institutions can
contribute. For a description of the overall foc us of the proposed MoAII
digital library, see Appendix C; for descriptions of the individual
collections to be digitized relating to the theme of transportation, see
Appendix D. Collections, and individual items in the collections, will be
selected for their current and anticipated use, relationship to other
similar items in the collections of other participants, and risk of
deterioration from use of the originals. Curators of the collections will
consult with faculty or scholars as necessary during the selection
process. In addition, the selection will be informed by work of the
CLIR/ACLS Task Groups on Research Requirements for Various Formats of
Material (to be formed).
- Accessing the Digital Library: The Users’ Perspective:
Project participants will create a multi-level digital library structure
in which the highest level of access is through a union catalog of USMARC
collection-level catalog records accessible through a web interface, using
OCLC’s Site-Search software. The USMARC records will point to an
intermediate level of access consisting of Encoded Archival Descriptions
(i.e., finding aids for primary resource materials) of these collections.
The Encoded Archival Descriptions (EADs), which will be housed in a union
database for the purposes of cross-collection searching, will then point
to digital surrogates of primary resources. For examples of the prototype
discovery, navigation, and access system, see Appendix A.
Users of the prototype digital library will have the option of
beginning their search at the highest level of indexing (the catalog
record), or at the intermediate level (the collection of EADs). Users who
have links such as a URN (Universal Resource Name) for a specific EAD or
digital surrogate can go directly to that item (i.e., a known item
search). Users who encounter a link to the EAD or digital surrogate while
browsing will also be able to follow that link to the item. A more
technical descrip tion of the access methodology can be found in Section
IV.F.2, The Digital Technology Plan: Access Overview.
-
Metadata Research:
Community standards for metadata are an important
element in the digital library architecture model proposed by NDLF. This
model incorporates three types of metadata associated with digital library
objects (for a background review of the proposed architecture, see
Appendix F):
- Intellectual metadata
are used to describe digital
objects so that they can be identified and "discovered". The NDLF
national digital library architecture envisions intellectual metadata
existing on two levels: a) USMARC records loaded into union and
local online catalogs and; b) intermediary indices that are used
to provide more detailed access (e.g., in this project, the EADs are an
intermediate access layer). Although the MoAII Testbed Project will not
focus on issues of intellectual metadata, both the USMARC record union
catalog and the EAD finding aid intermediate level access layer will be
provided in the MoAII Testbed Project database to allow the NDLF
architecture model to be tested, modified and evaluated.
- Structural metadata
are used to define the internal
organization of a given digital object so that a user can navigate within
the object.
- Administrative metadata are used to record
information about the digital object which will need to remain with the
object for its long term retention and use.
The assumption of the MoAII Testbed Project is that careful definition
of various classes of digital information objects is required in order to
enable users of the national digital library (which will be created by
geographically distributed contributors and will incorporate information
resources housed in distributed repositories) to find, navigate through,
and effectively use information. Project investigators will work with the
broader community to define: a) practices for creating and encoding
administrative metadata for digital images of primary source materials;
b) practices for defining structural metadata for the digital
objects used in the MoA II project, and; c) alternatives for
encoding the digital objects themselves (i.e., practices for encapsulating
content, e.g., images, along with administrative and structural metadata
inside the object).
The development of agreements on these topics
is a necessary precondition for the establishment of large-scale
production-level digitization projects that can be built into an
interoperable, distributed digital library that can be migrated and
archived over time. Investigators will use the MoAII Testbed Project to
explore alternatives for metadata practices, including the encoding of
this information into digital objects and to help the broader community
understand the advantages and disadvantages of various approaches.
- Defining Structural Metadata for MoAII Digital Objects: The
EAD is a community standard for encoding finding aids. Automated systems
based on the EAD have enhanced access to primary source materials by
allowing scholars and students to view and navigate these digitized
archival collection descriptions over the Internet. The California
Heritage project demonstrated that the finding aids could be linked to
individual images which act as surrogates for primary source materials.
The American Heritage project demonstrated that the creation of a
multi-institutional virtual archive of finding aids is not only feasible,
but has the potential to enhance the use of primary sources.
While the EAD provides a community standard for finding aids, it does
not provide any guidance for creating or encoding the digitized surrogates
of the primary source materials pointed to by these finding aids.
Therefore, the next step in the process of developing seamless
access to digitized primary source materials through EAD based-finding
aids is to develop the related community practices for creating and
encoding the digitized versions of the primary sources.
Structural metadata is an important component of information
that must be captured and standardized as part of any process that
digitizes primary source materials, as it is this metadata that defines
the internal organization of an archival object (e.g., a book, diary,
manuscript, photo album, etc.). Computer programs can then use the
standardized structural metadata to display and navigate that object, at
the scholar’s request. For example, RLG has deployed a display/navigation
software tool called Documatic that will be investigated in this project,
as part of the effort to identify standard structural metadata
elements.
The internal organization of digital archival objects beyond a single
image, such as those used in the California Heritage Project, can quickly
become complex. In the last section, it was suggested that a digital
book-like object could have structure such as chapter and pages.
In fact, it will have a rich and complex organization which could include
items such as title page, preface, table of contents, indices, references,
pages (possibly both image and text versions of the same page), chapters,
colophon, etc. This complex organization must be captured in the
structural metadata and become part of the digital archival object it
helps to define. In our book object example, access to the structural
metadata allows a computer program to implement book-like behaviors such
as turn to the next page, jump to page twenty, skip ahead four pages and
jump from the table of contents to a specific chapter by clicking on the
chapter name.
The problem becomes even more complex because the structural metadata
for any given object, like a digitized book, may vary based on how it was
created. For example, a digital book that is created from scanned page
images could have different structural information than one which had its
text converted from an OCR process. The former is a series of images
linked together by the structural metadata to provide book-like behaviors.
The latter is a string of text characters in which the concept of a page
is less clear. However, in both cases, one would still want the book to
support similar behaviors, such as jumping to chapter three. Another
complication that needs to be addressed is how to handle structural
metadata that may also be intellectual metadata. For example, chapter
headings may be indexed for searching (i.e., intellectual metadata) and
also be used for navigating an object (i.e., structural metadata: e.g.,
jump to chapter 3).
The value of providing standardized structural metadata for classes of
digital library objects, such as books, manuscripts, etc., is that it
makes it easy to share and use objects on a national basis. NDLF
envisions that digital objects will be stored in distributed repositories
spread across the country. Given the size and number of digital library
objects available on a national basis, this seems the only reasonable
approach. However, if each repository used its own proprietary structural
metadata scheme, this would mean the computer programs used to display and
navigate the objects would need to be rewritten for each repository. The
result is that users would have to deal with proprietary navigation tools
for each repository accessed. By standardizing the structural metadata, a
single navigation tool could be used across all repositories. If a
particular organization had a need to develop a specialized display and
navigation tool for a class of archival object, it could be done once and
would still work across all repositories.
An important research objective for this project is to work with the
participating institutions, NDLF sponsors, other research libraries and
national organizations to: a) identify the classes of digitized
archival source materials that require standard practices for structural
metadata encoding (e.g., books, manuscripts, photographic albums, single
images, diaries, pamphlets, etc.); b) describe the attributes and
behaviors that these classes must exhibit, and then to develop structural
metadata practices that will allow them to be used in a uniform manner
across repositories. For example, should all book objects be able to
support the behavior of turning to the next page, the previous page,
jumping to a particular chapter, linking for an index entry to the
appropriate reference, setting and returning to a book-mark, etc.?;
c) investigate encoding schemes for structural metadata, such as
SGML/TEI, table-based models, etc.; d) create the MoAII Testbed in
which the proposed structural metadata standards and encoding schemes can
be demonstrated widely and then evaluated by the community.
The MoAII Testbed Project will be based on the work undertaken by
Cornell and Michigan in the Making of America I Project (Appendix H) which
investigated access to digital surrogates of books, and on the Berkeley
E-Bind project (Appendix J) which has begun to explore structure for other
archival materials (see http://sunsite.berkeley.edu/Ebind/).
The goal of this research objective is to develop a national consensus on
a structural metadata best-practice and/or standard for these categories
of materials, as was done for finding aids (i.e., the EAD standard). The
structural metadata, along with the administrative metadata, will be
encoded with the digitized content of the primary source material to
create a digitized archival object that can be stored in a distributed
repository. The result will be to ensure that each class of digitized
primary source material, once discovered by scholars and students through
the EAD standardized finding aids, can be viewed and navigated in an easy
and consistent manner, regardless of its internal structural complexity or
source of origin on the network.
- Defining Administrative Metadata For Images: Another
important research focus of this project will be to build consensus on
issues of administrative metadata. We will focus primarily on
administrative metadata created before or at the time of digital image
conversion of library resources rather than on metadata associated with
derivative files. During the course of this project, MoAII Testbed
Project participants will draw upon the combined expertise of the group,
as image technicians and end users, to define an administrative metadata
set, focusing on point of capture. We believe that working on these
issues with this group of experienced participants will pave the way for
later decisions about such issues as system architecture and archival
strategies for refreshment and migration, as well as a variety of other
topics.
During the MoAII Testbed Project, participants will compare and
discover best practices for project management of data, especially keeping
in mind the notion of entering/capturing metadata once and then
using/reusing it for a variety of purposes. Administrative metadata must
serve short term project management as well as long term file management
purposes and form the basis (an "electronic colophon") for future users
who need to analyze the images and their derivatives. We expect the
metadata to be used by different audiences, so that any model adopted must
be adaptable; the information will need to be dynamic, changing as files
are used and new formats derived, as well as permanent, to carry notice of
the requirements for using the files, particularly critical in thinking of
technical change and obsolescence. Careful definition and encoding of
administrative metadata are critical to future archiving and migration of
digital objects.
At the end of the project, we expect to have a well-defined and tested
set of administrative metadata for image capture. Through group
discussion and usage we will be able to reach a consensus that will prove
to be robust under the scrutiny of the larger community. We will
integrate this with an accompanying "best practices" document.
This will provide solid image metadata so that a future systems
architecture group can build tools and structures for these digital
resources on a solid foundation. By establishing and evaluating best
practices in the MoAII testbed system, we will contribute to the creation
of more permanent archival standards and procedures, constituting, in
effect, community metadata standards. 1
Although administrative metadata relating to intellectual property
rights are equally important, and project staff will do background
research into existing practices, it is unlikely that this project will be
able fully to explore the relevant issues and propose practices.
- Research on Naming Conventions and Systems (see Appendix I for
additional information):
- The Need for Naming Conventions and Systems:
The NDLF
envisions a national digital library architecture that will populate
distributed repositories with digital library objects. While developing a
fully distributed architecture for a national digital library is beyond
the scope of this project, it is desirable to begin work on specific
community practices that will be incorporated into such an architecture.
The first of these practices that must be addressed is the naming of
digital library objects. The methodology discussion to this point
has described the process that creates a digital archival object by
encapsulating the encoded administrative and structural metadata along
with the digitized primary source material. Once a digital archival
object is created, it must be named so it can be retrieved later for use
by scholars and students.
In time, the population of digital objects that act as surrogates for
books, journals, manuscripts, photographic images, video recordings, audio
presentations, multimedia productions, etc., will grow into the
millions. A key component in realizing a national digital library
architecture is developing a standard naming convention that allows each
digital object to be uniquely named in order that it may be located on
demand in its home repository. As important, the naming convention must
be supported by a network based system that can create distributed naming
authorities to manage the images and their unique names (i.e., assigning
names to newly created images, renaming, moving an image to a new storage
location, etc.).
The naming convention must: a) allow for distributed
naming authorities so each unique digital library object can be named by
its owning institution; b) create a persistent name (e.g, a URN -
uniform resource name) that will remain the same even if the object is
moved to a new location on the network; and c) be able to retain
information on multiple instances of an object (e.g., an object might be
mirrored in another repository for performance reasons).
In addition, the naming convention must be implemented within a name
creation/resolution system that is reliable (available 24 hours per day
and guaranteed by mirrored sites) and minimizes network traffic where
possible. Finally, it must implement a distributed name administration
service allowing each naming authority with proper security access to add,
edit, and delete names for their digital library objects.
- Naming Convention Research Methodology: The Internet
Engineering Task Force (IETF) has been investigating, specifications for a
URN that can be used to name digital objects on the Internet. Already,
there are at least two prototype naming support systems that are based on
the work of the IETF: the OCLC PURL system and the CNRI Handle System.
In addition, the Object Management Group (OMG), a consortium of over 700
corporations, is developing a technology for naming and managing digital
objects in distributed repositories, called CORBA (Common Object Request
Broker Architecture). Finally, Microsoft is promoting an architecture
that competes with CORBA called DCOM (Distributed Component Object Model).
For additional information about these various naming schemes, see
Appendix I.
This project will investigate the following architecture research
objectives:
-
opportunities and shortcomings of using URN’s to name digital
library objects within an architecture that could support a national digital library;
-
the functionality of the prototype systems that support URN
creation, management and name resolution to determine if they are adequate
for use in a national digital library architecture;
-
the possible role CORBA and/or DCOM naming and distributed
object support systems may play in a national digital library
architecture. (Note: these object based technologies are not necessarily
in competition with URN based naming and could very well complement the
use of URN's) and;
-
the possibility of creating best practices for digital naming
conventions that are informed by the research conducted in this project
and accepted by the community.
In order to achieve these research objectives, the project will add to
its testbed a CNRI Handle Server and an OCLC PURL server to be used to
evaluate the issues in using URN’s to name digital library objects. The
testbed will also be used to investigate the functionality required in a
URN based system for creating and supporting naming authorities, creating
and managing individual names for digital library objects and for the
resolution of names into network based addresses needed to access each
object. The project will also investigate both CORBA and DCOM
technologies to see what role these may play in naming digital library
objects and, if appropriate, add these technologies to the testbed.
It is significant to note that CNRI and OCLC have agreed to participate
in this project. It is expected that the use of their prototype systems
in the MoAII Testbed Project will lead to enhancements in their systems
that will benefit the community as a whole.
Standards and Best Practices: The project will use
established practices and standards where they exist (for example MARC,
Z39.50, SGML and EAD) and will attempt to develop common practices and
community standards where they do not yet exist. Since the project will
use easily available commercial software (for example, SiteSearch, SGML
authoring tools, and standard web-browsers), no software development will
be needed. Limited systems integration work will need to be done.
Because of its experience with the technology, the access model,
operations, and production, Berkeley will take the lead among the NDLF
collaborators and offer its methods as a basis for this project. As noted
above and in Appendix G, these methods have been developed through a
series of R&D projects. While Berkeley’s procedural methods will act
as guidelines for the other collaborators, strict adherence to them will
not be required and they may be changed generally as a result of the
collaborative work. Flexibility is an important aspect of the production
methodology the consortium of participants wishes to demonstrate, because
in the real-world production of data is decentralized. The different
procedural methods used by participants will be documented and analyzed as
part of the evaluation of the project so that they can be adopted as
production methods for later participants in the fully fledged MoAII
Project.
Although procedural methods will vary, project participants will
actively participate in the development of, and adhere to, agreed-upon
practices governing the encoding, pointing/linking, and quality of data
capture (USMARC for collection-level records; EAD for finding aids;
Uniform Resource Names (URN) and Uniform Resource Locators (URL) for
naming resources; administrative metadata for digital images; and the
structural metadata practices developed for multi-image digital objects).
The use of best practices and standards is the key element that will allow
the project to succeed in bringing together a diverse group of collections
dispersed among five institutions into a single, coherent, access
system.
PLAN OF WORK
- Overview: This is a highly collaborative project involving
five sponsors of the National Digital Library Federation (Cornell, New
York Public Library, Penn State, Stanford, and UC Berkeley). Because of
the number of institutions and the technical and intellectual challenges,
the project is complex. The plan of operation calls for a tightly
scheduled sequence of events occurring over two years and requiring the
close cooperation of many staff members in a number of different
departments at the many collaborating institutions. The plan of
operations will be divided into three main phases:
The planning phase (beginning July 1, 1997 and continuing
until April 30, 1998) will be funded by NDLF and the participating
institutions. During this year, participants will set the foundation for
the work to be carried out with NEH funds beginning in May 1998. The
research and production phase (beginning May 1, 1998 and
continuing for one year) is the MoAII Testbed Project being proposed for
funding by NEH and the participants. It is during this phase that the
core research and community standards-building process will take place and
the digital library prototype will be created. During the
dissemination phase (immediately following the NEH-funded
investigations) the NDLF will provide funding for dissemination and
community review of the results. The following is a description of the
work to be done in each phase:
- Planning Phase (July 1, 1997 through April 30, 1998):
Participating institutions, at their own expense, will do the following:
-
Make final selection of collections to be included in this
project. To be completed by August 1997.
-
Create USMARC collection-level records for resources to be
included in the project. To be completed by April 30, 1998.
-
Author finding aids for each collection selected, then mark
them up using EAD. Decentralized authoring will be done using any of
several commercially available SGML authoring tools. Berkeley can provide
software recommendations to collaborators who desire advice. Although
most participants already possess expertise in the creation of EAD
records, those that do not will take responsibility for gaining it. For
example, they can attend training sessions provided by RLG; alternatively,
Berkeley is willing to offer a session to participants and other libraries
at cost. To be completed by April 30, 1998.
-
Gain expertise in scanning required for work in the research
and production phase (second year) of the project. Most participants
already have substantial experience in scanning. If necessary, Cornell
University is willing to offer training to participants and other
libraries at cost. Each institution will identify staff with this
knowledge to serve as "technical liaisons" for the project
(minimally one individual per institution). To be completed by Fall
1997.
-
Maintain the centralized MARC and EAD records at Berkeley
throughout the duration of the NDLF and NEH projects. Participants may
also mount their MARC records in their OPACs for local access and/or on
the bibliographic utilities. Several will also store their EADs locally,
with links from local MARC records and also links to the digital images of
the collections described in the EADs. Local infrastructure decisions to
be made by Winter 1997.
-
Participants will also develop and implement their own
system or production database for tracking images and image metadata
during the production phase of the project. Berkeley has created a local
system to track images created for the California Heritage and Digital
Scriptorium projects, and will share these production and tracking methods
with other participants. Cornell also has expertise in this area, which
will be shared. To be completed by April 1998.
Staff at Berkeley and Cornell, in consultation with Dr. Howard
Besser, will review imaging practices from a wide variety of projects,
including, for example, Making of America, JSTOR, Digital Scriptorium,
Getty Information Institute and others. Based on this review, procedures
for the production phase of this project will be developed, and draft
practices created by Fall 1997. On this foundation, participants will
then re-assess their scanning expertise and work with Berkeley and
Cornell staff to refine production plans (Winter 1997-Spring 1998).
-
While participants are conducting the work described above,
Berkeley will concurrently take the lead in developing naming as well as
administrative and structural metadata proposals for participants’ review
as foundation to the research and production phase of the project.
-
Working with Dr. Besser, staff at Berkeley will identify
issues, and create draft community recommendations for capture and
encoding of administrative metadata. This work will involve the other
participants in this project, other libraries, digital library experts,
computer scientists, and consultants as appropriate and will be completed
by Winter 1997.
-
Berkeley, in consultation with Dr. Besser, and in extensive
collaboration with participants, other libraries (especially Michigan and
Cornell), external experts, and the Library of Congress, which has
developed mechanisms for navigating though complex objects included in its
digital library, will take the lead in developing proposed structural
metadata models. Berkeley and the collaborators will work closely with
the CLIR/ACLS Scholarly Task Forces to determine the behaviors that
scholars expect from particular types of digital objects. Researchers
will analyze the content proposals from the participants to identify major
potential classes of information objects that will need structural
metadata defined. They will develop and test prototype practices for
structural metadata in the NEH project year; a draft from Berkeley is due
to participants by Winter 1997.
-
Berkeley will install a CNRI Handle Server and an OCLC PURL
server for the project. Berkeley will analyze Handle, PURL, CORBA, and
DCOM protocols, and consult with computer scientists as well as experts in
industry to develop a thorough understanding of the various alternatives
available for naming. Berkeley will collaborate with CNRI to develop
proposed naming conventions and will work with the other participants to
create appropriate naming authorities and to determine the exact
methodology for evaluating these services. To be completed by Spring
1998.
- Research and Production Phase (May 1, 1998 through April 30,
1999):
The following plan describes production, training, standards
refinement, and feedback plans.
- Review of EADs:
The project will begin with Berkeley's
review of the EADs created by each institution during the planning phase.
The American Heritage Virtual Digital Archive project will have developed
recommendations for a range of acceptable practice for encoding data with
the EAD. These recommendations will be further tested, verified, and
modified if necessary. Participants will then revise the EADs if
necessary. This review will be completed by June 1998.
- Training:
In June 1998, a one-week intensive training workshop
for the participants' Project Coordinators and technical liaisons covering
all aspects of the project will be given at Berkeley by the Project
Manager with the assistance of staff from Berkeley's Electronic Text Unit
(ETU), Dr. Howard Besser, staff from the preservation departments at
Cornell and Berkeley, and other experts as appropriate (possibly from
CNRI, the Library of Congress, or other institutions). The workshop
curriculum will introduce participants to the draft practices for creating
administrative and structural metadata to be used in the project,
conversion methods and image capture guidelines for various source
documents, discussion and development of standard naming conventions,
pointing and linking information (URN/URL), and a thorough discussion of
various production methods. During the period of instruction, discussion,
and hands-on training, procedures will be modified as necessary. These
procedures will the n serve as norms for the remainder of the project,
being modified periodically through a process of community review.
Berkeley will host a private listserv specifically dedicated to this
project so that participants can work together at all times.
During Fall 1998, Berkeley staff will conduct site visits to each
participating institution to review work, monitor progress, give further
instruction as necessary, respond to questions, or gather recommendations
for change.
Following the production phase, Project Coordinators and technical
liaisons will return to Berkeley for a January 1999 workshop to finalize
recommendations for practices to be used in subsequent projects, which
will be formally introduced to the library community in the third
phase.
- Production and Integration of Data
- Union Catalog:
Berkeley will create the union catalog for this
project, and will accept MARC records from the participants through FTP.
Records with appropriate links (i.e., handles) to EAD will begin coming in
June 1998 and will all have been loaded by December 1998.
- Union Database of Intermediate Metadata: EADs
reviewed by Berkeley’s Electronic Text Unit, and edited by participants,
will be FTP'd into the union database at Berkeley. EADs with appropriate
links (i.e., handles) to images will begin coming in June 1998 and will
all have been loaded by December 1998.
- Creation of Digital Images: Images will be
produced, along with administrative and structural metadata; the images
will be named, linked to EADs, and EADs to collection-level records. Many
project activities will overlap and a number of teams will be working
concurrently on the different aspects of the project at each of the
participating institutions. Approximately 4000-5000 images will be
captured by each participating institution. Image conversion should begin
at each institution by August 1998 and be completed by December 1998.
Metadata will be created on a similar schedule, and mechanisms for its
processing established for each institution and the entire project.
The steps followed by each of the collaborating institutions will be
similar to those described below for Berkeley's part of the project.
Collections will be prepared for digitization by project staff (for the
collections included in the project see Appendix G). A selection team
will develop and document an image selection plan (since, for practical
reasons, not all of the contents of selected collections will be
digitized). The selection plan will determine criteria for providing
title, name, subject, and genre access, where appropriate, to individual
images and to "clusters" of related images, which will be retrieved as
groups.
At Berkeley, the selection plan will be undertaken by the project team
in The Bancroft Library Technical Services Department (BTS), according to
guidelines established by the curator, Dr. Bonnie Hardwick, who will
consult as necessary with faculty. Selected items will be digitized by
the Library Photographic Service (LPS), an in-house service agent.
Digital reproduction may be done in several ways, including flatbed
scanning, conversion of photographic intermediates, or digital camera.
Participants will use agreed upon standards for image capture when they
exist; but establishment of imaging standards is not a primary focus of
this project.
The selection of the images and recording of administrative and
structural metadata for the 5,000 items at Berkeley will take
approximately five months (June 1998-October 1998). The collaborating
institutions will follow roughly the same time table, even if they choose
to employ different procedures. Coordination of the participants will be
managed by the Project Manager at Berkeley, who will also monitor the
production schedule and the quality of the data produced by all
participants during the course of the project; ensure that the methods
used follow project guidelines and are cost-effective; and confirm that
the required number of images is produced.
- Storage of Digital Images: Although the participants'
standardized intellectual metadata (MARC collection-level records and
SGML-encoded finding aids) will be integrated on a central server at
Berkeley, the digital images of items from their collections, along with
associated structural and administrative metadata will be stored locally
on the participants' own servers. Berkeley will offer technical advice
and consulting services for participants throughout the project.
- Public Access:
Berkeley will maintain a website throughout the
project, describing the project and providing periodic updates about
progress. The project's website will also make metadata and naming
proposals available for public comment and point to the union databases of
MARC and EAD records. As completed collections become available (starting
in September 1998), EADs with images will be linked to the project
website, comprising a prototype digital library.
- Evaluation and Dissemination of Project Results:
The project
will be under continuous evaluation by participants, scholars, technical
consultants, and the library community; standards, practices, and
procedures will be modified throughout the course of the project as a
result of experience, deliberation, and review of related work occurring
simultaneously elsewhere. The Interactive Media Group at Cornell will lead
a comprehensive, formal review of the project, including user studies. A
variety of methods will be used to ensure wide dissemination of
information about the project while it is in progress, to allow feedback
from users and other libraries about the prototype practices for
administrative and structural metadata, and naming conventions; and to
present formal results. For a full description of the evaluation and
dissemination plans, see sections VI and VII, below.
Dissemination Phase (Beginning June 1999): Following
completion of the Research and Production Phase, the NDLF will fund an
invitational seminar to review project results. Participants will include
representatives of a broad spectrum of fields and interest groups,
including, for example, digital library experts, archivists and special
collections librarians, scholars, computer scientists, museum
technologists, and others who have participated in other phases of
development of the EAD protocols, are engaged in similar work, or who have
appropriate expertise. The results of this phase will include widespread
dissemination of the results of the project, refinement as necessary of
the practices established, and formulation of an agenda for further
community review and acceptance.
SUMMARY: PROJECT
TIME TABLE
| | Planning Phase: July 1, 1997 through
April 30, 1998 |
| Aug. 1997 | Select collections to be included
in the project; create USMARC collection-level records as needed in OCLC
or RLIN, following national standards. |
| Sept. 1997 | Identify staff to function as
project Technical Liaisons as well as Project Coordinators. |
| Oct. 1997 | Begin creating Finding Aids for
each selected collection in compliance with the EAD SGML DTD; all Finding
Aids will be created by April 30, 1998. Administrative image metadata
model for project distributed by Berkeley for participant and
broader review. |
| Nov. 1997 | Draft practices for image capture
documents completed by Berkeley and Cornell staff and circulated for
review. |
| Dec. 1997 | Structural metadata model, based
on collection descriptions, distributed by Berkeley for review by
participants and scholars. |
| Jan. 1998 | Participants to review in-house
scanning expertise and experience and overall infrastructure, relative
to models offered (draft practices and models above); additional training
and documentation needs identified and solutions planned. |
| Spring 1998 | Participants develop documented
rationale for selection of items within collection based on developing
project models and practices; each institution to identify up to items
for conversion. |
| Spring 1998 | Berkeley will procure a Handle
server for the project; initial testing to be completed for operation by
March 30, 1998. |
| | Research and Production Phase: May
1, 1998 through April 30, 1999 |
| June 1998 | Berkeley reviews participants' EAD
work on finding aids for collections of this project. Participants'
Training Workshop at Berkeley, for Project Coordinator and Technical
Liaison from each participating institution (one week). Project homepage
and listservs begun. |
| Aug. 1998 | Berkeley installs union catalog
for collection level MARC records for project and begins receiving
records from participants via FTP. |
| Sept. 1998 | Berkeley installs EAD union
catalog and begins receiving EADs from participants via FTP. |
| Aug.-Dec. 1998 | Image conversion at each
institution: retrieve collections, digitize and, load images locally,
assemble metadata, including adding handles to MARC catalog records and
EADs. |
| Nov. 1998 | Site Visits: Berkeley Project
Leader to visit each participant's site to monitor progress. |
| Dec. 1998 | Evaluation Team (EVAL) preliminary
work: to present model at January meeting. |
| Jan. 1999 | Second meeting of Project
Coordinators and Technical Liaisons from each participating institution,
to revise working documents and finalize models based on their use.
Also, introduction of evaluation procedures. |
| April 30, 1999 | Evaluation process and final
report for NEH completed. |
| | Dissemination Phase: May 1, 1999
through August 30, 1999 |
| May 1--Aug. 30, 1999 | NDLF-sponsored
invitational conference on MoAII Testbed Project results held. Conference
report published within 90 days. Project staff write and deliver
papers; demonstrate prototype at conferences and meetings. |
Management Plan: The UC Berkeley Library will be the lead
institution among the project's collaborators. The project will be managed
by the UC Berkeley Library through the Berkeley campus Sponsored Projects
Office in accordance with all applicable Federal and University
guidelines. Coordination of the participating institutions will be
overseen by a Management Council comprising the Project Director, Bernard
J. Hurley; a Project Manager to be hired; and project managers from each
participating institution. Key personnel from Berkeley's project team
will coordinate each step of the project with the other collaborating
institutions and advise them on procedures and technical matters connected
with the project. A project manager at each institution will guide local
production of catalog records, finding aids, and images; serve as liaison
to Berkeley's management team; participate in the consortial management
team for the project; and assist in evaluation. Each partner will also
designate a technical liaison to coordinate technical implementation with
the Berkeley team. OCLC and The Research Libraries Group will each
provide a liaison to this project, and CNRI will serve as technical
partner on development of naming protocols. All of the principals in all
of the participating libraries have extensive experience in managing
complex projects and in collaborative efforts among libraries. The NDLF
Planning Task Force and its architecture group will review the project’s
progress regularly. Work will be closely coordinated with the CLIR/ACLS
scholarly task forces. The Management Team includes:
- Principal Investigator:
Dr. Peter Lyman will
provide policy oversight of the project, coordinating project activities
at the policy level with the other participating institutions'
administrators. He will assure the continued development of the prototype
access system and the maintenance of the data created in the project. He
will participate in disseminating the project results and will represent
the project to national bodies such as the Association of Research
Libraries, the Council on Library and Information Resources, the National
Digital Library Federation Policy Board, and the Coalition for Networked
Information.
- Project Director:
Mr. Bernard J. Hurley will lead the design
of the research methodology and technical architecture of the
demonstration project. He will have primary responsibility for planning
and directing each phase of the project. He will direct the evaluation of
software and hardware and oversee the development of the prototype system.
He will consult with CNRI on the development of the Handle System URN
management software. He will serve as administrative liaison with the
Project Coordinators from the collaborating institutions, and will be the
project's representative to the NDLF Architecture Group. The Project
Manager will report to him, and he will participate in efforts to
disseminate the project's results.
- Consultant:
Dr. Howard Besser will provide expertise on
imaging and metadata. He will assist in the development of metadata
proposals, serve as liaison with related projects elsewhere, and assist
with dissemination.
- Project Manager:
A Project Manager, reporting to the Project
Director, will be hired to supervise implementation of the project. S/he
will assist in the design of the prototype architecture with the
assistance of the Project Director. S/he will coordinate operations among
the participating institutions, develop the training curriculum and carry
out the training of the Project Coordinators of the collaborating
institutions. Following training, s/he will visit each of the
collaborators to provide follow-up support and consultation. The project
manager will serve as an SGML technical consultant to the project and s/he
will also coordinate and supervise Electronic Text Unit staff mounting and
publishing the marked-up texts for Berkeley. S/he will monitor the quality
of all project finding aids. S/he will participate in efforts to
disseminate the results of the project.
- Archival Control Coordinator:
Mr. Jack von Euw, Head of
Technical Services in The Bancroft Library, will oversee Berkeley’s
selection of cataloging records, finding aids, and images. He will consult
with collaborating institutions concerning finding aids and the
integration of related collections in the prototype. He will participate
in efforts to disseminate the results of the project.
- Curator of the Bancroft Library Western Americana Collections:
Dr. Bonnie Hardwick will assist in the creation of the project's materials
selection plan and will participate in the process of selecting individual
images for digital reproduction. She will serve as liaison with curators
in the participating libraries, and with scholars. She will also provide
assistance in the authoring of finding aids for some of the collections
used in the project. She will participate in the dissemination of the
project's results.
- Berkeley Library Imaging Coordinator:
Mr. Barclay W. Ogden,
Head of the Conservation Department at Berkeley, will consult with
collaborating institutions on imaging and procedures for photo and digital
reproduction of the materials used in the project. He will supervise
Berkeley's imaging process, resolving any questions and ensuring that
quality and production goals are met. He will assist in the development
of the administrative metadata recommendations and participate in
dissemination of the project results.
- Evaluation Coordinator:
Dr. Geri Gay, Director of Cornell
University's Interactive Multimedia Group (IMG), will develop and carry
out the evaluation of the prototype system. She will develop the
evaluation guidelines and questionnaires. She will work with the
collaborators to make sure that these reflect their concerns about
performance of the prototype. She will supervise the implementation of the
evaluation procedures, including distributing evaluation materials. She
will compile and analyze the evaluation data. She will write the
evaluation report for inclusion in the project's final report. She will
participate in the dissemination of the project's results.
- Participating Institutions Project Coordinators:
For resumes
of project directors and other key staff at the participating institutions
see Appendix M.
- Technical Staff:
The UC Berkeley System Administrator
(Programmer Analyst III) will install the Handle System, a URN management
and resolver developed by CNRI. The Systems Administrator will install
and support Unix based central software (OCLC's SiteSearch, DynaText,
DynaWeb, ArborText, etc.), configure the Unix kernal to support new
hardware (e.g., magnetic disk) and will tune and monitor the systems
performance. The Programmer Analyst III will use the Handle System to add
URN/URL support to the DynaText web browser and the image viewer.
The Production Controller (Programmer Analyst I) will be responsible
for receiving (via FTP) the metadata from all participants and
loading/indexing it into DynaText on a regular, frequent schedule. S/he
will provide technical support to project participants.
An SGML DTD specialist will be hired to develop structural metadata
proposals. The incumbent will coordinate this work with similar work
being done elsewhere, for example at Michigan, through TEI, etc.
- Image Selection Staff:
A full-time Library Assistant IV will
serve as work leader for Berkeley's selection of images and its assembly
of cataloging records and finding aids; two Library Assistant IIIs will
assist the work leader for Berkeley's selection of images; four student
assistants will assist the library assistants with the selection of images
and the assembly of cataloging records and finding aids. They will also
assist with other support functions connected with the imaging workflow
and finding aid conversion.
- NDLF Planning Task Force:
The NDLF Planning Task Force will
receive regular reports of progress, and will review project
methodologies. It will disseminate information to NDLF libraries that are
not participating in this particular project, and will ensure that the
MoAII Testbed Project is effectively coordinated with other NDLF projects.
The NDLF Planning Task Force will review proposals for NDLF funding, and
make recommendations about these requests to the Board of the Council on
Library and Information Resources. The Architecture subgroup of the NDLF Planning Task Force will play a key role in evaluating systems architecture proposals.
Digital Technology Plan:
- Conversion Overview:
Catalog records will be created (through
a local system or a bibliographic utility) using the the USMARC standard,
sent via File Transfer Protocol (FTP) to Berkeley and loaded into the
Union Catalog for the MoAII Testbed Project. The union catalog will be
based on OCLC’s SiteSearch software, which has been contributed for this
project. Finding aids will be converted to the EAD standard using SGML
authoring tools (ArborText's Adept Editor, SoftQuad's Author/Editor,
etc.), sent via FTP to Berkeley and loaded into the finding aids union
catalog for the Making of America II Testbed Project. This union catalog
will be based on EBTs (Electronic Book Technology’s) DynaText Software.
Images and associated metadata will be created at the participating
institutions and loaded on their local servers.
- Access Overview:
Berkeley will install and run a URN naming
and resolution server developed by CNRI: the Handle System. Please note
that in the Handle System URN’s are called handles. USMARC catalog
records will contain the handle (i.e., URN) of the related finding aid.
Similarly, finding aids will contain the handles (i.e. URNs) of related
images. The following represents a typical access scenario (for samples
of the access system, see Appendix A):
- Search the Making of America MARC Union Catalog:
The
user enters a union catalog search on a Web browser. The participating
institution’s Z39.50 compliant web server (e.g., Berkeley uses OCLC’s WebZ
software) sends the search via Z39.50 to the union catalog (i.e.,
SiteSearch), retrieves the results and then displays catalog records in
the Web Browser.
- Select the Link to the Finding Aid from the Catalog
Record:
The handle (i.e., URN) for the related finding aid
displays as a highlighted field in the USMARC record. The user clicks on
the handle to see the finding aid. The handle is sent to the Berkeley
local handle server and resolved into a URL for the finding aid.
- Display the Finding Aid:
The finding aid URL is sent
to DynaWeb which retrieves and converts the SGML-based document into HTML
for display on the Web browser.
- Select a Link to an Image from the Finding Aid:
The
handles (i.e., URNs) for the related images display as highlighted fields
in the finding aid. The user clicks on a handle to see an image. The
handle is sent to the Berkeley handle server and resolved into a URL for
the image.
- Display the Image:
The image URL will be sent to the
appropriate participating Web server which will retrieve and display the
image on the browser running the appropriate plug-in or helper
application.
- Hardware and Software Summary:
- SGML Software
:
SGML Authoring Tools
(ArborText's Adept Editor, SoftQuad's Author/Editor, etc.); SGML Browsers
( Electronic Book Technology's DynaText browser and DynaWeb, a SGML to
HTML real-time converter that allows viewing SGML documents on the Web);
SGML Database Manager ( Electronic Book Technology's DynaText Database);
appropriate image viewers (XV for UNIX, etc.).
- Union Catalog Software:
The union catalog MARC records
will be loaded onto the Z39.50-compliant OCLC SiteSearch System running at
Berkeley.
- Web Software:
Users will access all data via Web
browsers (e.g., Netscape Navigator, Microsoft Explorer, etc.). Each
campus will run a Web Server (e.g., from Netscape, NCSA, etc.) to provide
access to the decentralized images.
- URN Creation/Management/Resolution Software:
The
project will install and run the Handle System developed by CNRI. Besides
the global handle server maintained by CNRI, the Berkeley campus will also
run a local handle system that will be the home handle
service for this project. Handles generated by each naming authority
will be stored on this system and requests to resolve the handles into
URLs will be resolved by the Berkeley handle server.
- Central Hardware:
The SiteSearch MARC union catalog and
the DynaText SGML finding aids database will be loaded at Berkeley on the
Library’s Sun Microsystems SPARCcenter 2000E (a gift of Sun Microsystems).
This server and all data are backed-up on a regular basis.
- Decentralized Hardware - Image Storage:
Images will
be decentralized to servers at participating institutions. Images will be
named with URNs (handles) and accessed via Web server software.
STAFF QUALIFICATIONS (for additional information, see Appendix
M):
- Principal Investigator:
Dr. Peter Lyman is
University Librarian of the University of California, Berkeley. Before
coming to Berkeley, he served as University Librarian and Dean at the
University of Southern California. He has written and consulted widely on
the library of the future, particularly on issues related to the use of
digital and networked information to create uniquely appropriate and
powerful tools and collections for research, teaching, and learning. Dr.
Lyman will spend approximately two hours per week on the project.
Project Director: Bernard J. Hurley, The Berkeley Library’s
Chief Scientist, has served as the Director for Access Services during the
past two years and was formerly Director for Library Systems, beginning in
1981. As Chief Scientist, Mr. Hurley is responsible for strategic
technical planning and systems architecture for the Berkeley Library. He
is The Library’s chief liaison with technology partners in the corporate
and research sectors. He has responsibility for program planning and
policy formulation as a member of the Library’s Administrative Group. Mr.
Hurley has been working in the field of library automation for the last
seventeen years. While at Berkeley he has played a central role in
developing the GLADIS System, Berkeley's online catalog, catalog
maintenance, authority control, and circulation system, and its access to
the Berkeley Campus Information Network. He has served as the Project
Director for Berkeley's U.S. Department of Education funded Finding Aid
Project, NEH funded California Heritage Digital Image Access Project, and
NEH funded American Heritage Virtual Digital Archive Project . Mr..
Hurley will spend four hours per week on the project.
Technical Consultant: Dr. Howard Besser, Professor in the
School of Information Management and Systems at Berkeley, is an
internationally-recognized expert in digital imaging metadata standards
and systems and has consulted, written, and lectured extensively on these
issues. He will spent approximately 10 hours per week on the project.
Project Manager: A Project manager will be hired specifically
for this project. Qualifications will include supervisory and management
experience, knowledge of SGML and DTD development, basic understanding of
scanning, metadata, and digital library architectures.
Qualifications of Other Key Personnel:
- Archival Control Coordinator:
Jack von Euw, Head
of The Bancroft Library Technical Services, is responsible for the
reorganization and management of the unified technical services. His
previous assignment included managing The Bancroft Library Manuscripts
Retrospective Conversion Project and the processing phase of the
Preservation and Improved Access of the C. Hart Merriam Papers Project. He
was the Archival Coordinator for the Berkeley Finding Aid Project, the
California Heritage Project, and the American Heritage Project. Mr. von
Euw will spend four hours per week on the project.
Curator of the Bancroft Library Western Americana
Collections: Dr. Bonnie Hardwick’s doctorate focuses on Western
American history. She has an extensive background in Special Collections,
manuscript curatorship and processing, archival librarianship. She will
spend two hours per week on the project.
Berkeley Library Imaging Coordinator: Barclay W. Ogden, Head
of the Conservation Department, is a specialist in the design and
administration of library preservation programs. He founded the
Conservation Department at Berkeley in 1980, which has grown under his
direction to become one of the five largest such programs in U.S. research
libraries. Mr. Ogden serves as the Director of the University of
California Preservation Program, a cooperative effort of the nine
University of California campuses, was a leader in California's effort to
develop a statewide preservation plan, under contract to the California
State Library. He is Imaging Coordinator for the California Heritage
Project. Mr. Ogden will spend two hours per week on the project.
Evaluation Coordinator: Dr. Geri Gay directs The
Interactive Multimedia Group (IMG) at Cornell University. The IMG is an
interdisciplinary research and design team created to understand and
improve the expanding role of computers in communicating, learning,
working, and playing. IMG studies how humans interact with computers, and
how technology can mediate communication.
EVALUATION:
The evaluation of the Making of America II Testbed Project will be
led by Cornell University’s Interactive Media Group (IMG), see Appendix K.
The plan will focus on four major issues: 1) the selection of
classes of digital objects to be standardized and the elements used to
create structural and administrative metadata for each class; 2)
the effectiveness and ease of use of the MoAII Testbed for discovering and
navigating digital objects; 3) the feasibility of using the MoA II
Testbed architecture in a national digital library and; 4) the
economic feasibility of the MoAII Testbed project approach.
Issue 1: The evaluation plan calls for the development of
studies to determine whether the project selected the proper classes of
digital objects to be standardized. In addition, it will evaluate the
elements selected to be encoded as structural and administrative metadata
for each class. The goal is to determine if this selection of classes and
metadata elements provided value to scholars and students who used the
digital objects within the MoAII Testbed.
Issue 2: The project will also evaluate how effective users
found the MoAII Testbed as a research and learning tool, including issues
such as ease of use. Evaluation criteria will include searching,
navigation, pointing, linking, mapping issues, and control and display
capabilities for the different classes of digital objects. User response
to how each class of digital object could be discovered, displayed and
navigated will be measured. Beyond that, the evaluation will attempt to
measure not only if the MoAII Testbed architecture improved search capabilities and navigation options over their print counterparts, but whether they encourage new ways of searching and understanding the materials represented.
Issue 3: There will also be a technical evaluation of the MoAII
Testbed architecture. The design of this prototype system will be
examined, including performance issues associated with the delivery of
digital surrogates of primary sources over the network and the use of
Z39.50 to connect MARC bibliographic and SGML textual databases. The
evaluation will also study the ability of the testbed architecture to
scale to a large number of users and a large number of digital objects
spread across distributed repositories. In particular, the different
digital object naming schemes considered in the project will be evaluated
for use in a true national digital library architecture.
Issue 4: The project will evaluate whether the proposed MoAII
model offers a cost effective method of providing access and control of
digitized primary source material available over the Internet. Because
this is a research and demonstration project, the amount of usable data
will be limited. Nevertheless, economic and workflow data from all
collaborators will be collected and analyzed. The project will monitor
and evaluate the costs of training and all the different methods used in
creating digital objects that act as surrogates for primary source
materials, including the creation and encoding of metadata. The final
evaluation report will make recommendations on the most economical
approaches to the creation of a distributed repository for
archival digital objects.
For the proposed project, the IMG will employ classroom experiments,
online questionnaires, and various qualitative methods to assess the
feasibility and utility of network-based finding aids. The evaluation
plan will call for data collection, interpretation, and reporting at three
intervals: prior to implementation; at mid-project; and just before
project closure. IMG's participation will include evaluation planning and
administration, development of qualitative and quantitative measures, data
gathering, analysis, and reporting.
DISSEMINATION:
- Dissemination Plan: Project results will be made widely
available.
- Detailed reports
will be submitted to the NEH, CLIR Board, the
Policy Committee of the NDLF, the National Historical Publications and
Records Commission (NHPRC), the National Archives and Records
Administration, the SAA Committee on Archival Information Exchange, OCLC,
RLG, The Cooperative Interchange of Museum Information (CIMI), Association
for Information and Image Management (AIIM), and the administrations of
all of the project's collaborators. The reports will be posted on the
Internet for public access and review.
- The Digital Library
created during the MoAII Testbed
Project--its catalog records, finding aids, and digital images--will be
disseminated throughout the world on the Internet. It will be
demonstrated at exhibitions and presentations to professional conferences
and meetings, e.g., the CNI Task Force meeting, the ALA Annual Meeting,
the Rare Books and Manuscripts Section of the Association of College and
Research Libraries (ACRL RBMS pre-conference), the Society of American
Archivists Annual Meeting (SAA), and the American Society of Information
Science (ASIS) Annual Meeting. It will also be demonstrated at scholarly
meetings.
- Papers
will be contributed to professional conferences and
meetings (e.g.,ALA, ACRL RBMS pre-conference, SAA, ASIS, Western History
Association Annual Meeting, Organization of American Historians Annual
Meeting, and the American Historical Association, Pacific Coast Branch
Annual Meeting).
Papers will be submitted to manuscripts and archives
journals (e.g., The American Archivist); automation journals (e.g.,
LITA's Information Technology and Libraries, Educom Review,
and Academic and Library Computing, DLIB); and history
journals (e.g., Western Historical Quarterly and Pacific
Historical Review).
Draft practices, procedures, recommendations, etc. will
be made widely available throughout the project, using the MoAII Testbed
Project website.
- An invitational conference
, following the conclusion of the
project, will be funded by NDLF. The purpose of this conference will be
to evaluate the feasibility and desirability of building on the prototype
developed in the project; to refine the set of "best practices" for the
creation digital object archives and libraries of distributed resources;
to recommend next steps in the development of standards and policies for
such projects; and to formulate additional research and demonstration
questions to be pursued.
- Continuation of Work After the Termination of NEH Funding:
The project's participating institutions will have a considerable stake in
making sure the research in this project is successfully carried out and
that its results are followed up with further research and development.
The members of the NDLF will use the project results to stimulate the
national process that will lead to the formal adoption of metadata
standards for digital library objects. More ambitiously, NDLF will use
project results to lead to the creation of a national architecture for the
digital library which will allow for access to decentralized collections
of digital surrogates on the network. Moreover, the digital library
created through this project will serve as the foundation for a production
effort to create the Making of America Digital Library. We expect that
this project will identify of a future research agenda for issues such as
the following:
-
archiving, refreshing, and migration to ensure the longevity of
the digital library;
-
development of scholarly tools to enhance the usefulness of
the digital library;
-
development of digital library architectures that support
storage and delivery of complex digital objects, including methods for their use;
-
establishment of practices for the management of intellectual
property, including authentication of objects and users, authorization, and internet commerce;
Investigations into economic models that can sustain and
expand a distributed national digital library.
During the course of this project, each participant will develop
expertise in the creation of digital libraries, and will establish a
repository for the digital objects created; each institution has committed
to keeping the images stored on this repository accessible and to maintain
the links among MARC records, EADs, and images. Berkeley is committed to
maintaining the union catalog and EAD database indefinitely, at least
until such time as there is either an independent entity through which
such resources are archived and made internationally accessible, or until
it is technically feasible effectively to implement the digital library
using distributed metadata records.
Footnotes:
1.
The following are suggested administrative metadata categories for archival images based on the Digital Scriptorium Project, a collaborative imaging effort between Berkeley and Columbia. These may serve as a starting point for the MoAII Testbed Project
. Included are those elements that are necessary both for initial image processing and for long-term administrative and scholarly use; some are characteristics of the initial image capture, and some may be integrated in the capture system or part of corol
lary procedures. Those that should be retained for long-term use with the archival version of the image are marked with an asterisk. A proposed starting point for administrative metadata for archival image files includes:
*Date of Capture
Capture System:
*Type, *Brand, *Model, *Bit depth, *Color information system, *File format, Default setting/adjusted
*Image Authentication System Used (such as electronic watermarking)
Then, depending on the type of capture system (Below, possible administrative metadata relating to three different capture methods are described):
Digital Camera:
Batch #, *Pixel dimensions, *dpi, F-stop, *Electronic "shutter" speed, *Filter, Illumination level, *Glass used Y or N
Film/Photo CD:
Film roll #, Film frame #, Roll index #, F-stop, Shutter speed, *Filter, Film type, Illumination level, *Glass used Y or N. *Pixel dimensions, *dpi
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix A
Sample Discovery and Navigation System:
Collection-level Records
Encoded Archival Descriptions
Digital Images
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix B
The National Digital Library Federation
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix C
The Making of America II Project, Background
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix D
Subcontractor proposals and budgets
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix E
The Encoded Archival Description
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix F
Architecture for Information in Digital Libraries
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix G
Previous Research:
The Berkeley Finding Aid Project
California Heritage Project
American Heritage Virtual Digital Archive Project
UCEAD Project
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix H
Related Work:
Making of America
J-STOR
American Memory
Dublin Core
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix I
Naming:
IETF URN's
OCLC PURL
CNRI Handle
OMG CORBA
Microsoft DCOM
RLG Arches
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix J
Structural Metadata:
E-Bind
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix K
The Interactive Media Group,
Cornell University
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix L
History of Grants
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix M
Resumes of Project Participants
Advisory Board
THE MAKING OF AMERICA II TESTBED PROJECT
Appendix N
Suggested Evaluators
-=> END <=-