BrainDump Paper #2
Interface and Repository Considerations
For a System Architecture
To Support A Distributed Digital Library
DRAFT -- 12/17/97
Bernie Hurley
Chief Scientist, UC Berkeley Library
Libraries have long supported a culture of sharing in which materials from one library's local collection have been made available to users of other libraries on an as needed basis. This culture has given rise to automated systems, such as union catalogs and interlibrary loan modules, which aid library users in identifying and acquiring print-based materials from remote collections. The foundations of these library systems are the best practices and community standards that define the encoding of catalog records and the procedures for interlibrary loan.
The Digital Library Federation (DLF) is pursuing the development of the new practices and community standards that will allow the culture of sharing within the library community to prosper in the digital age. For example, the DLF's Making of America II (MoA II) project will actively investigate the types of metadata that is required to share digital surrogates of primary source materials over the network.
The DLF also has expressed an interest in investigating the use of a distributed system architecture to support access to digital library objects no matter where they may be located on the network. This paper explores some the features that would be desirable in distributed systems architecture, including flexibility, scalability and extensibility. Most importantly, it suggests an architecture that can grow as a coordinated, integrated system as new distributed repositories are added to the network. The goal being that whole (the systems of distributed repositories) is greater than the sum of the parts (individual repositories).
DESIRABLE FEATURES OF A SYSTEMS ARCHITECTURE FOR A DIGITAL LIBRARY
This section proposes a system architecture by describing a list of features that would be desirable in distributed digital library system. Please note that this is a conceptual architecture. Examples of design and implementation issues are touched on in the next section.
- Scalability:
A distributed system architecture would be made up of many network repositories that held multiple classes of digital library objects, (e.g., digital books serials, photographs, etc.). Attempting to create a single, centralized repository is not a feasible architecture for the DLF, as it would not scale given the potential size and number of digital library objects that will come to exist over time.
- Support for "Well Defined" Classes of Objects:
The repositories would be populated by "well defined" classes of digital library objects (e.g., digital books, serials, photographs, etc.). The phrase "well defined" implies that the metadata and content encoding for each object within a particular class would be consistent and predictable. In addition, each class of digital object would support a set of base methods (i.e., program code) that would implement behaviors for that object class. For example, a digital book would support a behavior to fetch the next page, the previous page, the table of Contents, etc. It should be noted that creating "well known" classes for certain types of digitized primary source materials is one of the goals of the DLF'S MoA II project.
- Support for Standardized Tools Across Repositories:
Given that the objects within each repository belong to well defined classes, it will be possible to create standard tools that work across repositories. These tools would be built up from object behaviors to implement the digital library services required by the DLF. For example, a book viewer tool could be built to display and navigate digital books from different repositories. As each digital book object would support the base methods for that class, the tools could use these behaviors. To continue our example, our digital book viewer tool would ask a book object for the next page (i.e., execute the object's method to fetch the next page). The viewer tool would then display the new page for the user.
- Repository Tool Self-Sufficiency:
This concept holds that each repository supports a number of base-line tools for each type of object it holds. That is, each repository is self-sufficient in terms of providing some level of access to the objects it holds. The DLF as a whole would have to determine the base-line tools for each class of digital object.
- Tools are Implemented as Java Applets:
The tools would be uploaded to a Web browser as needed. For example, if a user wished to view and navigate a particular digital book, the repository would know to upload the digital book viewer.
- The System is Extensible and Flexible:
As best practices are created for encoding new classes of digital objects, these objects (including their methods) could be added to any repository. The corresponding base-line tools would also be added to the repository. These new classes of objects would then be accessible to any user via the Java applet based tools provided by the repositories. This design would support any number of digital library classes.
In addition, the tools themselves are theoretically extensible. If one does not like a particular tool, it could be extended by subclassing the tool's methods to create a new tool. Similarly, even object behaviors for a class could be modified by subclassing. An "organizational problem" occurs if the new, desired functionality of the tool requires additional metadata. In this case, it would be best to extend the metadata for that class of object in every repository. However, it is possible to extend it in only one in such a manner that the base-line tools still work on the extended objects. However, the new tool will not work on the same objects in other repositories that lack the added metadata. In fact, by adding new metadata to a given class, a new class has been created.
- The Whole is Greater than the Sum of the Parts:
Adding new repositories or new classes of objects to the system only adds to it value, as the additions become available to users in an integrated manner through the standardized, base-line tools supported by the repository. As DLF expands repositories and supported object classes, they blend into the architecture as welcome additions. That is, the systems grows along with the DLF.
IMPLEMENTATION DETAILS
The experienced system designer will immediately want to address the great number of implementation issues suggested by the conceptual system architecture proposed above. This section gives examples of some of these issues.
- Why Java Applets?
Theoretically, an architecture similar to the one described above could be implemented with HTML and CGI scripts. That is, each repository could support base-line tools built from CGI scripts. However, it can be argued that HTML/CGI becomes unmaintanable in a complex system environment. If fact, it is difficult to see a real architecture in linking together thousands of HTML pages with hundreds of CGI scripts. The object oriented approach, as supported by the Java language and Java applets, creates the possibility for a maintainable architecture through the generation of digital objects that encapsulated both content and methods. In addition, this architecture becomes both flexible and extendible by supporting object reuse and subclassing based on inheritance.
- Why Wasn't a Distributed Object Architecture Proposed?
Indeed, a distributed object architecture supported by middleware such as CORBA or DCOM is a very attractive design consideration. CORBA, for example, has built in support for distributed repositories, object naming and event services. If one is building a system for the future, implementing a distributed object architecture by making the Java applets CORBA aware (i.e., creating an Object Web) has many advantages, not the least of is the ability to have some object methods execute server-side in a transparent manner. However, middleware like CORBA does have some problems in open systems based on today's technologies (mostly performance issues related to network speeds). While these will most likely be solved in the next few years, the author did not want to discussion of the proposed architecture to become a referendum on the current state of distributed object technologies. Author's Note: If I were to designing a system, I would certainly be looking at CORBA very carefully.
- How is Search and Retrieval Implemented?
In theory, simultaneously searching multiple repositories could be supported by this architecture. In fact, it may be best to support a central union catalog created from appropriate metadata for the discovery function until better distributed search solutions are developed.
- Does one Really Want to Encapsulate all the Metadata, Content and Methods Together in the same Object?
Conceptually, yes. Once an object is instantiated, all this information should be available. However, until the object is needed, its parts can live in different places (i.e., databases).
-=> END <=-