EAD and Databases
Perl and ADO on MS Windows
Alvin Pollock Digital Publishing Group
This is an old tutorial I wrote describing Perl's database functionality
especially as applied to EAD. It is somewhat out of date now. It is geared
exclusively toward EAD version 1.0 and SGML. As I have learned more and
done this kind of programming many many times I have learned better ways
to do things but sadly have not had the time to update this tutorial. Many
of the concepts should still prove useful and so this tutorial remains. I
recommend especially the fundamentals section
which should give users with perl experience and database experience a
nice introduction on applying the first to the second.
I've actually done some work when I can on an updated tutorial. Look
for something new next year.
While you're here, please visit our
EAD Tools page for all of the publicly available
tools we use here at UC Berkeley.
Introduction
Here in the UC Berkeley Library's Digital Publishing Group
we have combined relational databases with EAD quite successfully
and applied this combination to many interesting uses. The core of
our strategy has been to create methods for easily importing EAD
into databases and methods for getting EAD out of databases. While
we've been able to export EAD for some time, as have many other
repositories, being able to import EAD is a fairly new ability for
us and has opened up a wide range of possibilities. This tutorial
will describe these techniques in detail.
While many repositories are beginning to explore using databases
to author new finding aids which can be exported to EAD, there has
always remained the problem of what to do with legacy finding aids
already encoded in EAD. Corrections and updates had to be applied
directly in EAD rather than within the authoring database. The ability
to import older EAD finding aids into the database means that
repositories can unify their authoring under a single tool. For
those repositories which use a commercial collection management
system or one of a home-grown variety, importing older EAD-encoded
finding aids means they now have ability to manage all of their
collections within the software. Finally, databases offer exciting
possibilities for displaying and manipulating finding aids. One
such application described here is a web-enabled finding aid database
which affords a repository much more power in publishing their
finding aids online than traditional stylesheet-driven methods.
Perl and ADO
All of the methods described here use the perl programming
language and the
Microsoft ActiveX Data Objects (ADO) API.
Why perl?
The techniques I describe here can be implemented in just
about any language with database support. MS Visual Basic, C++,
and Java, are all very popular choices. However, many institutions
working with EAD are very likely to already possess substantial
perl expertise. We have found perl absolutely essential for
retrospective conversion of our legacy finding aids. Our encoders
use perl on a daily basis. In my experience perl is equally
entrenched in other institutions involved with EAD. But many
individuals may not even be aware of perl's excellent support
for database programming.
Using perl for database programming has many advantages
and a few disadvantages. Perl is far slower than C++ and
somewhat slower than Visual Basic. If speed is essential,
either of these languages makes a better choice. But perl is
without doubt the language of choice for text manipulation.
Many of the example techniques here will leverage perl's
powerful regular expressions and a syntax which makes
manipulating text strings easier than most any other language.
One of the most common complaints from programmers new to
perl is all the "funny symbols." But those $'s and @'s and %'s
are exactly what makes perl so powerful when manipulating text.
Because of these symbols, perl can support variable interpolation.
In perl I can write something like this:
print "<a href=\"$href\" target=\"$target\" title=\"$title\">\n";
Or if escaping quotes with backslashes isn't your cup of tea:
print qq(<a href="$href" target="$target" title="$title">\n);
You'd need to write that in Visual Basic like this:
Print "<a href=""" & Href$ & """ target=""" & Target$ & _ """title=""" & Title$ & """>" & vbLf
While neither java or C++ supports variable interpolation, at
least you can escape quotes with a backslash.
Perl supports a wonderful text construct called here documents,
the darling of cgi programmers world-wide. Using a here document,
I can print this:
print <<EOF;
<c0$depth level="$level">
<did>
<container type="carton">$carton</container>
<container type="folder">$folder</container>
<unitid type="$level">$unitid</unitid>
<unittitle>$unittitle, <unitdate>$unitdate</unitdate></unittitle>
<physdesc>$physdesc</physdesc>
</did>
<scopecontent>
<p>$scopecontent</p>
</scopecontent>
EOF
I can't even bring myself to duplicate this in Visual Basic,
but in java you would need to do this:
eadComponent = "<c0" + depth + "level=\"" + level + "\">\n" +
"<did>\n" +
"<container type=\"carton\">" + carton + "</container>\n" +
"<container type=\"folder\">" + folder + "</container>\n" +
"<unitid type=\"" + level + "\">" + unitid + "</unitid>\n" +
"<unittitle>" + unittitle + ", <unitdate>" + unitdate +
"</unitdate></unittitle>\n" +
"<physdesc>" + physdesc + "</physdesc>\n" +
"</did>\n" +
"<scopecontent>\n" +
"<p>" + scopecontent + "</p>\n" +
"</scopecontent>\n"
Certainly those programmers for whom java or Visual Basic
is the language of choice and who may have no more than a
passing acquaintance with perl are already familiar with issues
such as these and constructing strings like the one above
is second nature. It's all a matter of selecting the right tool
for the right job and leveraging existing expertise to its fullest
extent.
Why ADO?
Just as there are many choices in the programming language
you may wish to employ, there are many other database APIs to
choose from besides ADO. In fact ADO is a relatively new API
in comparison to OBDC, JDBC, RDO, DBI, et al. It really comes
down to a matter of preference. As I am a fairly lazy programmer
and am loathe to spend more time than I have to learning a
new technology, I was attracted to ADO's ease of use and shallow
learning curve. It bundled up typically burdensome database
programming tasks into high-level abstract functions. Anyone who
has used INSERT INTO SQL syntax to add new records to a complex
database will immediately appreciate ADO's AddNew() method. And
Microsoft's addition of
Data Shaping to manipulate and retrieve hierarchical data
is especially useful for EAD. ADO is the easiest method I know
to program databases.
ADO's biggest drawback, however, is that it is a Microsoft-only
technology. Don't expect to apply this tutorial on your Sun
or Linux workstation. It's Windows only, sadly. Most repositories
nowdays use or have easy access to Microsoft Windows. Importing
EAD-encoded finding aids into a relational database will not usually
prove burdensome. But serving out databases over the web might. ADO
works with just about any ODBC-compliant database. If you're using
MS Access or MS SQL Server, that's not a problem. But if you have
a significant investment in a Unix-based database such as Oracle
or Sybase, ADO may not be much help. You can still import and export
EAD into Oracle or Sybase on Windows using ADO, but you won't be
able to do it on Unix.
Next: What you'll need to get started

Have any comments on this tutorial? Submit them
Here.
Copyright © 2001 UC Regents.
All rights reserved.
Last update August 4, 2001.
|