MOA2 Digital Object Document Type Definition Tutorial

An XML Document Type Definition has been created for MOA2 digital objects. This DTD provides a means of encoding the various administrative and structural metadata for all electronic versions of a particular archival object.

An MOA2 digital object consists of four major sections:

  1. a descriptive metadata section;
  2. a file inventory section;
  3. an administrative metadata section; and
  4. a structural map.
The descriptive metadata section may point to external descriptive metadata (such as a finding aid or MARC record) or it may itself contain embedded descriptive metadata, or both. The file inventory section lists all of the files comprising all electronic versions of the archival object, with the files grouped together by version. The administrative metadata section provides information regarding how the files were created and stored, intellectual property rights, and the source of the files. The structural map provides one or more hierarchical descriptions of the original archival object’s structure (either logical or physical structure), and provides pointers from locations within those hierarchies to the files which comprise the various electronic versions. A more detailed explanation of each section and their inter-relations follows.

Descriptive Metadata

External Descriptive Metadata: Descriptive Metadata Reference. A descriptive metadata reference simply provides the URI for an external source of descriptive metadata. For example, the descriptive metadata reference below points to an OAC-based finding aid:

<DMDRef LOCTYPE='URL' DMDTYPE='FINDAID>http://www.oac.cdlib.org/dynaweb/ead/calher/breen/</DMDRef>

This <DMDRef> contains two attributes. The LOCTYPE specifies the type of URI being provided (PURL, HANDLE, DOI and PDI would be other options). The DMDTYPE identifies the type of descriptive metadata being referred to: MARC record, FINDAID, RDF, PICS or OTHER. Additional supported attributes provide for specifying the MIMETYPE of the external descriptive metadata, and a LABEL that can be used to identify the available descriptive metadata to the user.

Embedded Descriptive Metadata. Embedded descriptive metadata appearing under a <DMD> element can either use generic descriptive metadata elements defined in the DTD, or another user-defined text format enclosed in a wrapper. The generic descriptive metadata elements provided for by the DTD are grouped under a <GDM> element. These are closely related to the descriptive metadata fields supported by the Berkeley Generic Database--a database designed to gather the metadata needed to construct both EAD and MOA2 objects. The core elements include title, date, caption, dimensions, and material origin. In addition to the core descriptive metadata elements, the following elements are supported: administrative information, alternate date, content, creator, general notes, physical description, related materials, subobject source, and subject. Most of these elements can be qualified by a field type attribute which identifies the type of information contained in the element more precisely. For example, by means of the FieldType attribute, a general note (<General>) could be specified to be an appraisal note, a bibliography note, a biographical note, a series note, a scale note, etc.

The following example shows a generic descriptive metadata element for a map edition subobject in an MOA2 object for a map quadrangle.

<GDM ID='DM5'>
    <AltDate Date='1981' Seq='0.0'>Printed</AltDate>
    <Core>
        <coreDate primaryDate='1980' >Photorevised</coreDate>
        <Dimensions height='58.0' width='46.0' units='cm.' />
        <SOType>map</SOType>
        <Title>1980</Title>
    </Core>
    <Creator NameType='cn' SrcCheck='No' Role='Publisher' Seq='1'>U.S.Geological Survey</Creator>
    <General Seq='1.0' Public='Yes' FieldType='genScale'>1:24,000</General>
    <General Seq='2.0' Public='Yes' FieldType='genGeneral'>Contour interval 40 feet</General>
    <General Seq='3.0' Public='Yes' FieldType='genGeneral'>Polyconic projection; NAD 1927</General>
    <General Seq='4.0' Public='Yes' FieldType='genGeneral'>Revisions shown in purple and woodland
    compiled from aerial photographs taken 1979 and other source data. This information not field checked.
    Map edited 1980.</General>
</GDM>

Note that the <GDM> element contains an ID attribute. This attribute provides a unique, internal name for this particular GDM element which can be referenced by other portions of the document. In particular, it can be used to link a particular division of the document hierarchy to a particular GDM element, as will be described in the Structural Map section below.
 
 

File Inventory

The file inventory section consists of one or more file groups. A file group lists all of the files which comprise a single electronic version of the archival object. As a simple example, the following file group describes an electronic version of a single, stereoscopic photograph:

<FileGrp VERSDATE='1998-09-01' ADMID='HIREZJPG IPRIGHTS IMGSRC'>
    <File ID='HRJ1' MIMETYPE='image/jpg' SEQUENCE='1' SIZE='220346' CREATED='1998-09-01' OWNERID='CPR-hi.jpg'>
        <FLocat LOCTYPE='URL'>http://sunsite.berkeley.edu/~jmcdonou/CPRhi.jpg</FLocat>
    </File>
</FileGrp>

The <FileGrp> tag contains two attributes: VERSDATE, which provides the date this version was created, and ADMID, which provides the names of various sections within the administrative metadata section of the document which apply to all the files in this file group. The <FileGrp> element contains a <File> element (which provides some additional information regarding this particular file in its attributes), which in turn contains a <FLocat> (file location) element. The <FLocat> provides a network location for this file (in this case, a URL), and provides an attribute to specify whether this location is a URL, PURL, URN, etc.

More complex objects will have more complicated file groups for each version. A single file group for a photoalbum, for example, would contain numerous individual files (one for each page image, and possibly additional <File> elements for page details). Also, a single document may contain more than one <FileGrp>. For instance, a document for the stereographic image referenced above might have one file group for the high-resolution archival quality scanned image, and another for a low resolution thumbnail image.

Note that the <File> element contains an ID attribute. This attribute provides a unique, internal name for this file which can be referenced by other portions of the document. You’ll see this type of referencing in action when we look at the Structural Map Section.

Administrative Metadata

There are three main forms of administrative metadata recorded in the document: file management information, intellectual property rights information, and information regarding the original source of the electronic files referred to by the document. Multiple instances of each of these types of information may occur within a single document. For example, suppose that our stereographic image came in two different image file formats, a high resolution JPEG and a thumbnail GIF. You might create file management sections for each of these two formats as follows:

<AdminMD ID="HIREZJPG">
    <FileMgmt>
        <Image>
            <Compression>JPEG</Compression>
            <Dimensions X='1536' Y='1024' />
            <BitDepth BITS='16' />
            <ColorSpace>RGB</ColorSpace>
            <CLUT FResident='YES'></CLUT>
            <Resolution>200 DPI</Resolution>
        </Image>
    </FileMgmt>
</AdminMD>
<AdminMD ID="THUMBNAIL">
    <FileMgmt>
        <Image>
            <Compression>LZW</Compression>
            <Dimensions X='192' Y='128' />
            <BitDepth BITS='16' />
            <ColorSpace>RGB</ColorSpace>
            <CLUT FResident='YES'></CLUT>
            <Resolution>30 DPI </Resolution>
        </Image>
    </FileMgmt>
</AdminMD>

The first administrative metadata section provides file management information regarding the high resolution (1536x1024 16-bit RGB color) JPG format; the second describes the low resolution (192x128 16-bit RGB color) GIF. Note that each administrative metadata section ( <AdminMD> ) has an ID attribute: HIREZJPG and THUMBNAIL, respectively.

A review of the earlier file group example shows:

<FileGrp VERSDATE='1998-09-01' ADMID='HIREZJPG IPRIGHTS IMGSRC'>
    <File ID='HRJ1' MIMETYPE='image/jpg' SEQUENCE='1' SIZE='220346' CREATED='1998-09-01' OWNERID='CPR-hi.jpg'>
         <FLocat LOCTYPE='URL'>http://sunsite.berkeley.edu/~jmcdonou/CPR-hi.jpg</FLocat>
    </File>
</FileGrp>

Notice that the <FileGrp> tag has an ADMID attribute, the first item in which is the name HIREZJPG. This means that all of the administrative metadata contained within the <AdminMD> section carrying the ID attribute "HIREZJPG" applies to this file group (and by extension, all the files contained within it). In short, all of the files within that file group should be 1536x1024 16-bit RGB color images.

You’ll note that the <FileGrp> tag also has ADMID names of IPRIGHTS and IMGSRC. If you examine the example in Appendix I, you’ll find the administrative metadata sections carrying these names. These sections provide additional administrative metadata describing the files in this file group.

Structural Map

The structural map section of a document defines a hierarchical structure (or structures) which will eventually be presented to users of the electronic archival object to allow them to navigate through the document. The structural map encodes this hierarchy as a nested series of <div> elements. Each <div> carries attribute information specifying what kind of division it is, and also may contain multiple file pointer ( <fptr> ) elements. File pointers specify files (or in some cases, locations within files) that correspond to this location in the hierarchy encoded in the structural map. A <div> can also contain a <mptr> element which allows an MOA2 object to point to another MOA2 object that contains it, or another MOA2 object that is a subsidiary resource.

Some structural maps may be extremely simple. In the case of the stereographic image, for example, the hierarchy only has a single level, the stereographic image itself:

<StructMap>
    <div N='1' TYPE='Stereograph' LABEL='Secrettown. 62 miles -- altitude 3,000 feet. Photographer&#39;s number: 47' DESCMD='DM1 '>
        <fptr FILEID='HRJ1' MIMETYPE='image/jpeg' />
        <fptr FILEID='LRJ1' MIMETYPE='image/jpeg' />
        <fptr FILEID='GIF' MIMETYPE='image/gif' />
    </div>
</StructMap>

The structural map in this case specifies that the object has only a single level of hierarchy (one <div>), and only a single item at the level of hierarchy (N=’1’). The type of object at this level of hierarchy is a stereograph (TYPE=’Stereograph’), and the stereograph has a label which should be displayed to the user (‘Secrettown. 62 miles – altitude 3,000 feet. Photographer’s number: 47’). The descriptive metadata corresponding to this (single) level of the hierarchy is contained in GDM element with an ID='DM1'. There are several different files that correspond with this level of hierarchy, which are identified by their ID attributes.

To see this image, you could look at the file with the ID attribute in this document of ‘HRJ1’ (look at our previous <File> example and you’ll see it has an ID attribute of HRJ1). You could also look at two other files identified as ‘LRJ1’ and ‘GIF’; if you examine the full document in Appendix I, you’ll see these are low resolution and thumbnail versions of the image.

A more complicated example of a structural map can be found in the following very abbreviated example, which provides a portion of the structural map for four electronic versions of the Breen Diary from the Bancroft Archive:

<StructMap>
    <div N='1' TYPE='Book' LABEL='Diary of Patrick Breen one of the Donner Party 1846-57. Presented by Dr. Geo McKinstry to
    Bancroft Library' DESCMD='DM1 '>
        <fptr FILEID='HRJ1' MIMETYPE='image/jpeg' />
        <fptr FILEID='LRJ1' MIMETYPE='image/jpeg' />
        <fptr FILEID='LRG1' MIMETYPE='image/gif' />
        <fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='titlepage' />
        <div N='1' TYPE='Entry' LABEL='Friday Nov. 20th 1846 [Page 1]'>
            <fptr FILEID='HRJ2' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRJ2' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRG2' MIMETYPE='image/gif' />
            <fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='entry1'/>
        </div>
        <div N='2' TYPE='Entry' LABEL='Entry sat. 21st [Page 2]'>
            <fptr FILEID='HRJ3' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRJ3' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRG3' MIMETYPE='image/gif' />
            <fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='entry2'/>
        </div>
    </div>
</StructMap>

This structural map indicates the document has a two level hierarchy: it is a ‘Book’ with multiple ‘Entry’ components. In this particular example, there are four different versions of the book in question: high resolution JPG page image, low resolution JPG page image, thumbnail GIF page images, and an SGML encoded transcription of the diary. If you wished to examine the first entry (Friday Nov. 20th 1846), there are four files you could examine: three different types of page image files (HRJ2, LRJ2, and LRG2), or the SGML file (T1). Note in the case of the SGML file, there is one additional piece of information provided as an attribute, a TAGID (‘entry1’). This indicates that within the file identified within this document as ‘T1,’ you should find an SGML element tag with the ID attribute value of ‘entry1.’ This element within the SGML document marks the beginning of the diary entry in question.

Two final structural map examples below show the use of <mptr> elements. These examples expand on the Breen diary example above. The Breen diary actually consists of 2 entities: the diary itself and a letter that has been tipped into back of the diary. Both the diary and the letter can be represented by separate MOA2 objects that refer to each other by <mptr> elements in their respective structural maps. The first example shows the <div> in the structural map for the diary that points to the letter the diary has embedded in it. This <div> is the last one in the <StructMap>.

<StructMap TYPE='logical'>]
    <div N='1' TYPE='Book' LABEL='Diary of Patrick Breen one of the Donner Party 1846-57. Presented by Dr. Geo McKinstry to Bancroft
    Library' DESCMD='DMD1'>
        <fptr FILEID='HRJ1' MIMETYPE='image/jpeg' />
        <fptr FILEID='LRJ1' MIMETYPE='image/jpeg' />
        <fptr FILEID='LRG1' MIMETYPE='image/gif' />
        <fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='titlepage' />
        <div N='1' TYPE='Entry' LABEL='Friday Nov. 20th 1846 [Page 1]'>
            <fptr FILEID='HRJ2' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRJ2' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRG2' MIMETYPE='image/gif' />
            <fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='entry1'/>
        </div>
        …
        <div N='102' TYPE='Entry' LABEL='Mond. March the 1st [Page 29]'>
            <fptr FILEID='HRJ30' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRJ30' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRG30' MIMETYPE='image/gif' />
            <fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='entry102'/>
        </div>
        <div N='1' TYPE='Letter' LABEL='Letter by George McKinstry, tipped into original diary' DESCMD='DMD2'>
            <mptr xlink:href='http://sunsite.berkeley.edu/~jmcdonou/BREEN/breen.letter.xml' xlink:role='contained'
            xlink:title='Letter by George McKinstry, tipped into original diary' />
        </div>
    </div>
</StructMap>

Note that the <mptr> in the example uses attributes from the xlink namespace to identify the URI, for the MOA2 object for the associated letter. The title of the associated object is specified in an xlink:title attribute, and its relationship to the current MOA2 object in an xlink:role attribute.

The MOA2 object for the letter tipped into the Breen Diary refers back to the MOA2 object for the diary that contains it. This is shown in the <mptr> element under the root <div> in the example below:

<StructMap TYPE='logical'>
    <div N='1' TYPE='Letter' LABEL='Letter by George McKinstry, tipped into original diary'>
        <mptr xlink:href='http://sunsite.berkeley.edu/~jmcdonou/BREEN/breenMOA2.xml' xlink:role='container'
        xlink:title='Patrick Breen Diary : ms., 1846 Nov. 20-1847 Mar. 1.' />
        <div N='1' TYPE='Page' LABEL='Letter by George McKinstry, tipped into original diary, Page 1'>
            <fptr FILEID='HRJ31' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRJ31' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRG31' MIMETYPE='image/gif' />
            <fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='GMletter1'/>
        </div>
        <div N='2' TYPE='Page' LABEL='Letter by George McKinstry, tipped into original diary, Page 2'>
               <fptr FILEID='HRJ32' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRJ32' MIMETYPE='image/jpeg' />
            <fptr FILEID='LRG32' MIMETYPE='image/gif' />
            <fptr FILEID='T1' MIMETYPE='text/sgml' TAGID='GMletter2'/>
            </div>
        </div>
</StructMap>

Conclusion

The DTD provides a fairly flexible mechanism for encoding the descriptive, administrative and structural metadata that describe the files comprising multiple electronic versions of an archival object and their relationships.. It also manages to encode this information in a relatively efficient format. This flexibility and efficiency does come at the cost of some complexity. However, it is anticipated that documents will be primarily machine-generated, and machine-processed for display, so that complexity should be relatively well hidden from those producing documents, and users examining them.