BrainDump #2

Interface and Repository Considerations

For a System Architecture

To Support A Distributed Digital Library

DRAFT -- 3/23/98

Bernie Hurley

Chief Scientist, UC Berkeley Library

Introduction *


An Overview of the MODEL *

An Analysis of ThE Model *


A Three Tiered Implementation *

Tier 1 – The Client *

Tier 2 – The Application Layer *

Tier 3 – The Repository *

Adding More detail to the implementation *

Integrating Z39.50 into the Implementation *



The first paper in this series, The Need for Best Practices in Creating Digital Library Objects, discussed the culture of sharing long held by libraries and the need for new community standards to insure this culture migrated to the digit al environment. Specifically, it described a Digital Library Service Model that developed services and tools based on a foundation of digital library objects created to community standards. Given these standard objects, services and tools could be creat ed in a manner that integrated these materials across distributed repositories. That is, a tool could discover, display, navigate and manipulate digital objects of a given class, regardless of where they were stored on the network.

This paper explores some the features that would be desirable in a distributed architecture, including flexibility, scalability and extensibility. Most importantly, it suggests an architecture that can grow as a coordinated, integrated system as new d istributed repositories are added to the network. The goal being that whole (the systems of distributed repositories) is greater than the sum of the parts (individual repositories).



An Overview of the MODEL

This section proposes a simple model for interface and repository relationships in a distributed, digital library systems architecture (Figure 1). Not surprisingly, the model is client/server based.

Two notable aspects of this model are:

The repositories are populated with different classes of digital objects, but objects for the same class use the same encoding to encapsulate their methods, metadata and content.

Each repository must contain a base-level tool for each class of object it holds. These tools are used to discover, display, navigate and manipulate these objects.

In this model, the client would first invoke a discovery tool that would use descriptive metadata from objects to help the user identify items of interest. Once the user selected an item, its repository would provide the proper tool to display, naviga te and manipulate that object. The structural and administrative metadata need for these tasks would be available to the tool through method calls to the object.

An Analysis of ThE Model

This model can be investigated by reviewing a list of features that would be desirable in distributed digital library system. Please note that this is a conceptual architecture. Examples of specific design and implementation issues are touched on in the next section.

1) Scalability: A distributed system architecture would consist of many network repositories that each hold multiple classes of digital library objects, (e.g., digital books serials, photographs, etc.). Attempting to create a single, centralized repository is not feasible, as it would not scale given the potential size and number of digital library objects that will come to exist over time. The proposed model would scale, as objects can populate many distributed repositories and the tools to display and manipulate these objects would be available from the repositories that hold the objects.

2) Support for Standard Classes of Objects: The repositories would be populated by classes of digital library objects (e.g., digital books, serials, photographs, etc.) encoded to community standards. That is, the metadata and content encoding for each object within a particular class would be consistent and predictable. In addition, each class of digital object would support a set of base methods (i.e., program code that is conceptually part of the object) that would used to implement user level behaviors (i.e, how users describe what tools do). For example, a user level behavior such as the ability to "turn pages" in a digital book would be supported by methods that could fetch the next page, the previous page, the table of contents, jump ahead n number of pages, etc. It should be noted that creating standard objects for certain classes of digitized primary source materials is a goal of the DLF’s MoA II project.

3) Repository Integration through Support of Standardized Tools: Given that the objects within each repository belong to well defined classes, it will be possible to create standard tools that work across repositor ies. These tools would be built up from object behaviors to implement the digital library services required by each user population. For example, a book viewer tool could be built to display and navigate digital books from different repositories. As ea ch digital book object would support the base methods for that class, the tools could use these to implement user level behaviors. To continue our example, our digital book viewer tool would ask a book object for the next page (i.e., execute the object’s method to fetch the next page). The viewer tool would then display the new page for the user.

4) Tools as Part of the Repository: This concept holds that tools actually reside inside repositories. That is, each repository must include a base-line tool to display and navigate each type of object it holds. In addition, the reposi tory is responsible for delivering tools to the user on demand. This makes each repository self-sufficient in terms of providing some level of access and navigation for the objects it holds. This is especially desirable in that some level of access to a repository’s objects can be provided to communities that may not have the resource to create their own tools (i.e. public libraries).

5) The System is Extensible and Flexible: As best practices are created for encoding new classes of digital objects, these objects (including their methods) could be added to any repository. The corresponding base-line tools would also be added to the repositories that held these objects. The new classes of objects would then be accessible to any user through the tools provided by the repositories. This design would support any number of digital library classes.

In addition, the tools themselves are theoretically extensible. If one does not like a particular tool, it could be extended by subclassing the tool’s methods to create a new tool. Similarly, even methods for a digital object could be modified by sub classing.

An "organizational problem" occurs if the extending the functionality of a tool requires additional metadata. In this case, it would be best to add the new metadata into every object, in all repositories. However, it is possible to extend t he metadata to only some objects in such a manner that the base-line tools still work on the extended objects. The new tool will not work on the same objects in other repositories that lack the added metadata. In fact, by adding new metadata to a given class, a new class has been created.

    1. The Possibility for Joint Development: Developers have long dreamed of sharing the expensive programming efforts needed to create software systems. This architecture could provide a first step in this direction. For example, it should be po ssible for tool developers to re-use object level methods. At some level, all tools for a given class of object need the same functionality (e.g., getting the next page of a digital book).

It would also be possible to share the work of developing the base-line tools among a number of organizations. This is especially desirable in that the digital library will have many more classes of objects than a pr int library, such as social science numeric data, museum artifacts, GIS data, etc. Each of these classes will need special tools for display, navigation and manipulation. In fact, given a broad definition of a digital library that contains many objects provided from traditionally different communities, it may not be possible for any one organization to provide all the tools needed for their clientele. One should not assume that having a base-line tool supported by one organization means another organiz ation cannot extend that tool (e.g, change the user interface). In this case, both tools would be available.

7) The Whole is Greater than the Sum of the Parts: Adding new repositories or new classes of objects in a distributed digital library only adds to its value, as the additions become available to all users through the standardized, base-l ine tools supported by the repository. As repositories and supported object classes expand, they blend into the architecture as welcome additions. That is, the system grows in an integrated manner.



A Three Tiered Implementation

The experienced system designer will immediately want to address the great number of implementation issues suggested by the conceptual system architecture proposed above. This section suggests one possible implementation using a distributed object architecture (Figure 2).

Tier 1 – The Client

The client in this implementation would be any Web browser that supported a Java Virtual Machine. HTML (or XML) would be used to create the pages for the digital library application. However, the tools to discover, display, navigate and manipulat e the repositories and objects would be implemented as Java Applets, which would be uploaded on demand. For example, if a user wished to view a particular digital book, the repository that held that book would upload the digital book viewer.

Why Java Applets? Theoretically, the client services as described above could be implemented with HTML and CGI scripts. That is, each repository could support base-line tools built from CGI scripts. However, it can be argued that HTML/CGI is not mai ntainable in a complex, distributed architecture. In fact, it is difficult to see a coherent architecture in linking together thousands of HTML pages with hundreds of CGI scripts. The object oriented approach, as supported by the Java language and Java applets, creates the possibility for a maintainable architecture through the generation of digital objects that encapsulated both content and methods. In addition, this architecture becomes both flexible and extendible by supporting object reuse and subc lassing based on inheritance.

Tier 2 – The Application Layer

The application code would actually be distributed between the client and network application servers. In this proposed implementation, the support for the distributed objects would be provided from middleware such as CORBA (or DCOM). For example , a base-line tool would be implemented as a Java applet that was CORBA-aware (CORBA-ized?, CORBA-rated?). The applet tool would be developed using both its own methods, as well as ones invoked from objects. All the method’s signatures (i.e., calls to t he methods) would be part of the applet. Methods that are to execute on the client would have their program code embedded in the applet. For methods that are to execute server-side, the CORBA ORB would trap the invocation and have the code executed on t he server (note: it looks to the tool as if the code was executed on the client.). The CORBA middleware gives the developers the ability to move code execution between the client and server, as is most appropriate.

Distributed object middleware, such as CORBA, is attractive for many other reasons, including its support for distributed repositories, object naming and event services. If one is building a system for the future, implementing a distributed object arc hitecture by making the Java applets CORBA aware (i.e., creating an Object Web) has many advantages, not the least of which is the ability to have some object methods execute server-side in a transparent manner. However, applet based, distributed object architectures will have problems in today’s open systems if not designed properly (mostly performance issues related to network speeds). For these reasons, we would expect system designers to keep the Java applet tools small and simple at first, by havin g many methods execute server side. But as clients get more powerful and network bandwidth increases, the methods can be easily moved to the client.

Tier 3 – The Repository

The third tier in the database subsystem implements the object repository. Both the metadata and the content from digital objects can reside here. This tier does not necessarily need to use object oriented database technologies – it’s up to the r epository’s designer.


Adding More detail to the implementation

Stepping through Figure 3 will help to explain the implementation in more detail.

    1. The Web Server uploads a discovery tool (applet) to the browser.
    2. The discovery applet uses CORBA to execute server-side methods that search a union catalog. Three bibliographic records, each uniquely identified by an object reference, are returned. For the sake of argument, assume that the first two are digital b ooks and the third is a digitized photograph.
    3. The user selects the first record and the discovery applet passes that object reference to CORBA, which connects the applet to the repository that holds the digital book. Note that we can connect to the proper repository because its name is embedded in the CORBA object reference.
    4. The discovery applet then invokes a server-side method at the book’s home repository, which creates an HTML page with the appropriate tool (i.e., a Java applet book viewer) embedded inside, along with the object reference for the book.
    5. The HTML page is uploaded to the browser and the book viewer applet starts running.
    6. The viewer applet then passes the book’s object reference to the CORBA middleware, which again connects it to the proper repository from which this tool can display and navigate the digital book.
    7. Let’s assume that the user now goes back to the discovery applet and selects the third record. Once, again this applet will contact the repository, an HTML page is created and, in this case, an image viewer applet is run. Note that the photograph ob ject can be in a completely different repository from the digital book. The proper repository is identified through the object reference.

Integrating Z39.50 into the Implementation

Z39.50 has become a popular standard that is used to search compliant online catalogs. Figure 4 demonstrates how Z39.50 can be added to the implementation. Note, the Web server is now Z39.50 compliant and searches remote online catalogs for the user. The catalog returns bibliographic records and a URL for a CGI script that will create the HTML page with the proper display and navigation tool. In this case the object’s URN must identify the repository, as well as the object.