Here, at long last are my minutes from the meeting in New York. At the
end of the minutes, I've appended some work items that I think fell out of
our discussions, and also some research questions. I expect that both of
these lists will be with us for a while, and we can watch them swell and
shrink.
I welcome any sort of comments or discussion that these minutes might
generate.
Hope all is well with you, and I trust you are all playing with your
databases. Anyone who hasn't submitted receipts should do so very soon.
Our business office frowns on receipts that are not submitted promptly.
Merrilee
Monday, September 28, 1998
Scope of the project is not to create a critical mass of materials for the
testbed. Instead, we are aiming to produce a testbed with many different
examples of encoding so that we can 1. understand what functionality
scholars need from the tools for different types of objects, 2. identify
what metadata is needed to support these functionalities, and 3. come to
some understanding about the relative cost of supplying the metadata vs.
the needs for the functionality it supports.
Selection of materials. Penn State: focus on labor and transportation
materials. Correspondence, minute books, ledgers, photographs. NYPL:
Jackson and Stanton diaries and photographs. Cornell: letters, drawings,
photographs, negatives, correspondence.
Discussion: how to deal with items that aren't normally listed in a
finding aid? Simply deal with them in a MARC record, "force" them into a
finding aid?
How much backtracking should we be expecting to account for? (Allowing
people to get from an object back up through the tree through a series of
back buttons). This is a really interesting question for this project, or
for future research. What tools and metadata are necessary for doing
this.
While the basic assumption for this project is that researchers will get
to objects in a traditional way (using the catalog to find collections,
using a finding aid to locate items of interest within the collection),
the question of navigation once a user gets to an object is one that is
within the scope of the project. Peter wants pointers from MOA2 objects
back to folders and boxes. Jerry thinks he can do this with xptrs.
Bernie-is anyone interested in doing a finding aid for the overall
collection? Or should we perhaps use the EAD group in order to unify our
"collection"? The idea of an overall finding aid is problematic, but the
idea of a collection guide was put forward as a possible solution.
Problem for this is how does it scale? Who maintains it? Another research
question, perhaps.
Discussion of putting subject headings into finding aids; naming
issues-using PURLs and a PURL server for images; quality control issues
(deemed out of scope).
How to handle, where captions are transcribed? In the database or in a
file of transcriptions?
This project should produce at it's end a decision tree, a guide which
would allowing people to make more informed decisions on how to proceed.
Who will "cook" what and how much?
Berkeley will probably provide some TEI transcriptions of Dictations.
NYPL will provide transcriptions from a typescript which was made from the
actual diary. Penn State has minute books with diaries which could point
to letters. Cornell doubts that much, if anything, will be cooked.
Discussion of subject access in transcripts and best practice in full
cooking. Discussion of possibilities, expectations and reality, and the
dangers therein.
Will we evaluate how much all of this cost? No, since we won't be in
"production", there is no real way to assess costs. Christie suggests
that we track use with respect to differences between levels of
"cookedness" and look at what gets used.
Suggestion to track not relative cost but decisions made and why--a
decision tree in reverse. Each institution to log why they chose to
follow a particular path, but also record what other options were
considered and why they were decided against. This sort of narrative can
then be shaped into an overall report which may help guide practice.
Discussion of database:
The database is meant to perform two major operations. 1) provide a means
of defining objects and collecting their structural and administrative
metadata, and 2) provide a means of controlling the digital production
environment in which they are produced. Out of this information, Berkeley
will be able to automatically produce MOA2 objects.
Some comments which came up during the database discussion:
There is a need to add batch subobject generation to the database.
Color bar: is it there? If so, what kind is it?
We will need to have best practice for structure of subobjects within the
database, standards for naming objects within the database (possibly).
Required fields. Which ones are they, what does required mean (can't
create db record, or xml document won't be valid?), how do we indicate
them in the database. Decision that there would have to be some agreement
among members (in the form of "best practices") about which ones are
required, and what to do about it if they aren't filled in. They will be
color coded in the database.
"Best practices" would also be established for defining objects, labeling
subobjects, constructing file names, etc.
Issue of quality control; Berkeley will look into presenting image files
on demand from inside the database for this purpose.
Berkeley will also look into batch generation for derivatives; also look
into the possibility of extracting administrative metadata from existing
images and importing that metadata into the database.
Proposal for master file defaults for pixel height and width.
Tuesday, September 29, 1998
Introductions, recap of previous day.
Howard's presentation of the white paper. Digital library has been one in
which users interact with each library separately. Z39.50, etc. have been
making a move away from this on the search side, MOA2 helps get away from
this on the presentation side. Digital libraries need standards for
discovery (searching), knowing about (administrative metadata), navigating
(structural metadata), and rights information. MOA2 is about admin and
structural metadata. Why are standards important? Longevity,
interoperability, vendor incentive, making the path easier for others in
the future. What might some behaviors be? Navigation, display/print
behaviors. Use/users of collection. "Scan at the highest quality that
does not exceed the likely potential use/users/materials." (Christie:
when entering administrative metadata fields, we need to have a discussion
about how people are going to fill those fields out-controlled vocabulary,
for example). Physical dimension may be an item which will merit further
discussion. Extracting information from the header will be important.
For the project, CLUT might not be a required field. Q: should the
database not accept records that do not contain "required fields"? Answer:
no, but because this might impede workflow perhaps MOA2 objects don't get
created if required fields aren't present). Q: what about copyright
information, if that isn't there, does that imply that there are no
restrictions on the object? A: no. There is a real need for us as a group
and as individuals, and for the community as a whole, to be able to "mine"
certain metadata types directly out of TIFF headers. Rick will look into
this and see if it is possible in the Java Class.
Discussion on controlled vocabulary in filling in metadata fields. How to
be handled? Tabled for the time being.
DTD presentation: Review of the tutorial. Three sections of DTD. File
inventory, administrative and, structural map. Warning on using the
database and transcriptions. We are assuming that the database is the
primary source of structure of document. What goes into the database is
what goes out to the user. Caption from the database is where the label
in for the MOA document will come from. Discussion of MARC records should
be put into the MOA2 object? Discussion of putting MARC records into
finding aids. Maintenance problems seems to preclude this approach.
Should transcriptions point at MOA2 objects? What complications does this
pose? How will links get into finding aids to link to MOA2 objects? From
TEI item to finding aid, or to MARC record, or to the top, which will then
get you to MOA2 objects? Possible to embed links in finding aids to MOA2
documents? Bernie suggests that embedding the MARC record in the MOA2
document.
Tools
Focus on scholarly use of MOA2 objects. One of the things the group might
discuss is what other groups might be users of MOA2 objects. Peter brings
up searching and levels of granularity. Should people be able to search
within MOA2 objects, caption fields. Perhaps indexing finding aids, MARC
records, MOA2 documents, TEI documents, etc. How do the tools cope with
all of this? Out of scope because the project is not about discovery.
Moving around is within scope. Navigating between "levels" (MARC, finding
aids, objects). What is the world of discovery? Should this go back to a
library's catalog? Or is the user stuck in a MOA2 transportation cul de
sac? Because this is a testbed, yes. But in the real world, MOA2 objects
will be a part of the library, so no. Reviewing groups of images, or
thumbnails. Peter makes the point that there is very little hierarchy in
most archival documents. Navigation, in his view, is to maintain the
ability to move through the intellectual hierarchy of organization and
arrangement. Caroline Arms suggests in a more generalized set of tools
display of dirty OCR or not accordingly. Question from Bob about an
unordered collection of photos. How to display these? Browse a set of
thumbnails, maybe choose from those, devote as much real estate to this as
possible. Issues surrounding printing need further discussion. Ideas
about having two different or identical frames showing views which scroll
together or scroll separately.
Note from Caroline: always clarify images when they are pictorial items
and when they are page images.
IMG
Presentation from IMG and followup discussion from the group. What would
be best is not a user evaluation of a product, but an evaluation of
existing need. A major question we want to answer ism "Given that
development of these sorts of tools is possible, what features do you need
to do your work?" Determining if similarities or differences between and
among object types (e.g. photos, diaries, scrapbooks) are important in
designing tools. Bernies's summary: Use the testbed to discover what
users need and why they need it. What specifically to evaluate
navigation, or navigation within an object. Evaluating needs for
navigating within a context (how the item relates to the collection).
Have we effectively presented context. Administrative metadata, and how
this is important to scholars. Something around searching. What do we
gain or lose over paper based environment?
Feeling throughout the meeting that the "tradition" of MARC records and
finding aids is not going to work for users. That people will want to get
down to the object level quicker.
Controlled vocabularies:
Next steps: database for review next week, comments back in three weeks
and return to Berkeley. Finding aids on November 1st.
Required fields in the database vs. dtd vs.: John will color code in the
database. Source may not be present in the database. Discussion of
source as a form of metadata. Recording the size of the original, when
you don't have it. Dimensions of original, in order to generate print in
original size.
SAA: Panel on MOA2. Peter and Bernie are putting together this by October
19th. Important for archivists to think about these issues.
Summary of Work Items:
Database review to be completed by October 26th; revised database to be
available by December 1 (this date depends on the type of revisions
suggested).
Finding aids to be completed by November 1st.
Jerry to look into providing xptrs to point from MOA2 objects back to
boxes and folders in finding aids--work has already been done towards
this.
"Decision tree" to help data owners determine tradeoffs in treating their
data in particular ways.
"Best practice" document for entering data into database for particular
types of subobjects.
Development of controlled vocabulary for filling in database fields for
particular types of objects.
Decide on basic functionality for tools.
Investigate possibility of TIFF header extraction.
Look for and capitalize on opportunities to discuss the project
Summary of research questions:
Navigation within the context of a collection.
How to manage large collections of MOA2 objects? With a finding aid or
collection guide? How to handle in a scalable manner?
The role of descriptive metadata (i.e. MARC) in the MOA2 object or finding
aid.