[MOA2-WP:24] Response to Stanford Comments on MOA2 White Paper

Merrilee Proffitt (mproffit@library.berkeley.edu)
Thu, 14 May 1998 14:43:40 -0700 (PDT)

Sorry this took so long to get out. Thanks to Jim and Stanford, who
obviously put a lot of thought into this.

My comments in [ ]

Merrilee

---------- Forwarded message ----------
Date: Thu, 7 May 1998 07:29:39 -0700 (PDT)
From: Jim Coleman <Jim.Coleman@stanford.edu>
Reply-To: moa2-wp@sunsite.berkeley.edu
To: Multiple recipients of list <moa2-wp@sunsite.berkeley.edu>
Subject: [MOA2-WP:21] Stanford Comments on MOA2 White Paper

Stanford has two sets of comments. One set pertains to components of the
technical architecture itself, and seeks to address point 3. of the
planning phase document, i.e., a review of the overall architecture and
planning. The second set relates to some of the concrete issues and
recommendations that are made within the paper.

TECHNICAL ARCHITECTURE

Stanford understands that the White Paper is an initial proposal for and
description of the technical architecture to be employed during the
implementation phase of the project. As such, the White Paper is at
this stage more interested in setting forth an analytical framework,
defining the class or range of materials that MOA2 will include in its
digital library, and considering broadly the retrieval, access, and
navigation issues associated with the digital library and the community
standards that support access and retrieval than it is in considering
actual implementation details. As written, however, the WP contains a
mixture of theoretical/technical and practical that is occasionally
confusing, and this intermingling of the purely theoretical (e.g., color
analysis) with practical (e.g., ability to match items from the same paper
stock by using color analysis) clouds both the achievements and
deficiencies of the document.

[A good observation (mixing of theoretical and practical). We will try to
straighten this out in the next draft]

As achievements, we would count:

* the Service Model itself, which, we believe, lays out at the theoretical
level, a layered application model which is in line with current "best
practice" thinking on the construction of digital libraries.

* The metadata layering and schemata also represent a step forward in
codifying some of this information (although we believe, along with our
colleagues at Penn and NYPL, that this needs particular refinement that
will come as we work with the actual documents -- and not abstracted
models).

* The use of object oriented thinking/vocabulary is a mixed
blessing: while it offers some concrete conceptual advantages, it is
overly complex for many readers with some technical background, not to speak
of users of the "decision maker" object class. It also occasionally leads
to fuzzy thinking on the part of the authors who tend to take their models
as
reality. In fact, just about *every* actual archival object is
*much* more complex than our current OO (object oriented) model for it. Here
caution is warranted. We need to be sure that the "sharable" parts of the WP
can be grasped by the right audiences.

[The next draft will include a readers guide, an outline which will
summarize the document as a whole and guide those who are interested in
technical discussion to those sections. And hopefully steer those who are
not interested in this information away from those sections.]

* The recognition *throughout* that the metadata models/structures (EAD
included) are just that: models and "exchange formats" and do not
presuppose an implementation strategy.

As deficiencies, we would name three:

* Since much attention is given to access and retrieval, we were
surprised to find no discussion at all about relevant vocabularies within
the project domain. Here intellectual control using a standard set -- and
we think the Getty vocabularies (AAT, TGN, ULAN) -- are particularly
appropriate, and should receive some attention in the document.

[While I agree with you, this project does not focus on intellectual
metadata, and I think discussion of controlled vocabulary are best left
for another paper.]

* There is a disjuncture between the WP Service Model and the proposed
implementation: creating a union catalog of MARC records along with
associated finding aid(s), both of which would sit at Berkeley, is a model
which exercises virtually none of the advantages of the WP Service Model.
We need as a group to decide whether this disjuncture could be fatal to
our progress -- after all, how interesting will this implementation be as a
technical exercise in 2-3 years, when the project is complete? Would we not
want to consider a distributed model, along the lines of the extensions to
Z39.50 or the CIMI profile?

[Yes! We would have liked to have carried out work on a real distributed
model, but unfortunately that was not funded. Even though the testbed
will be housed in a non-distributed model, we should be collecting our
metadata and building our objects to live in a real distributed
environment.]

* The original grant proposal suggested that some method of establishing
persistence for naming -- perhaps using DOI in conjunction with a handle
server -- would be implemented and evaluated as part of the project. This
component appears to have vanished from the WP. Was this an oversight or
intentional?

[Again, this work was not funded by DLF.]

CONCRETE ISSUES/RECOMMENDATIONS

Stanford concurs with many of the comments of our colleagues at NYPL and
Penn State, particularly as these relate to the granularity of metadata
capture. In most instances, we assume that the issues of detail raised by
NYPL and Penn State will be worked out as a part of the project.

* Item-level metadata capture, although of indisputable value, is very
expensive. Stanford projects suggest that the cost of management and
capture is at least as great as the cost of image capture/storage. Given the
very low funding by the grant, institutional "co-contributions" to implement
a relatively "cooked" model would probably exceed grant funds by 3 to 5
times.

[I have touched on this in previous responses; we would only expect a few
fully cooked objects throughout the project, with most objects being raw
or seered. Item-level metadata capture should be easily batched.]

* While the EAD may be a good exchange format, it is arguably not always a
good method (as a "object" or container) for delivering such collections.
The WP also argues for a level of object access/retrieval/navigation that
goes beyond what the EAD is currently capable of. How we imaging that this
"enhanced" access will be achieved? On what software platform?

[What we intend is that navigation within a collection will be done with
EAD encoded finding aids using DynaWeb, or through SiteSearch for MARC
records. The tools, which will be at the object level, will be JAVA tools
in a UNIX environment.]

* Stanford will not be able to participate in any portion of the project
devoted to the access needs of a K-12 constituency. This also means that any
associated *metadata* required for K-12 access will have to be supplied by
others (or be inherited through other metadata classes).

[We are working under an assumption that K-12 tools will be much simpler
than those needed for a scholarly community (page turning as opposed to
text analysis). Therefore metadata for use by a K-12 type audience should
be more along the lines of the raw than what is needed for cooked.

* Stanford will not be able to meet the minimum recommendations of color
image capture if these are to be 24-bit, 600 DPI. We assume, in fact, that
the 600 DPI recommendation is for grey-scale or bitonal photographs, not
color, where we believe this level of capture is overkill. Consider: an 11
by
by 17 inch image captured under these settings would be over 200MB in
size, uncompressed (also recommended). Did the WP really mean to recommend
600 DPI, 24-bit color?

[One of the considerations in digitizing material from this period must be
preservation. Yes, we are suggesting the creation of large files. But we
certainly do not want to re-scan these items in five years. Again, these
are recommendations, and not requirements. As with recommended metadata,
participants should pay attention to recommendations, but they are just
that--recommended.]

* Lastly, we would like some clarification among participants on whether
we are attempting as a group to gather together and coordinate the items
to be part of MOA2 in the hopes of creating a "virtual collection" whose
sum is greater than the parts, and where the individual "parts" (i.e.,
the complete collection of any single institution) may not be completely
represented. Or are we presenting five separate collections that have a
common thread in subject content and where the achievements of the project
will related to delivery/metadata standards? Given the *original*
conversations among participants (pre-final grant proposal), we would have
thought the former to be the case, but the responses to Penn State/NYPL
suggest otherwise.

[Given that the grant is for research and demonstration, and not for
collections per se, I think the emphasis is more on the proof of concept,
and not on content. I also think we will wind up with a selection of
online materials which are not represented in each other's collections.]

------------------
Jim Coleman
Head, Academic Computing for the Humanities
Stanford University Libraries
380 Meyer Library
560 Escondido Mall
Stanford, CA 94305
650-725-3163
http://www-sul.stanford.edu/depts/hasrg/ats/ats.html