I have lately been working on refining an XML transform that will render our EAD finding aids in MARC. I think the end result has turned out OK, but it is far from optimal. I am hampered by several things:
- I’m not very expert at manipulating XML. A transform is basically a logic puzzle. I <3 logic puzzles and I think I’m picking it up pretty fast, but my bag of tricks mostly consists of what I can extrapolate from other peoples’ work. One evening as I was fiddling with an aggravating portion of this transform, Significant Other wandered by and casually suggested that I use regular expressions instead. I (rather impolitely) declined the suggestion, but he’s right – my methods are terribly inefficient. If I want to do this well, I’ll need to take some classes.
- EAD incorporates some questionable formatting. One example is having nodes (fields) that contain both text and subnodes (subfields) that also contain text, such as unittitle and unitdate. Although various XML gurus have offered a simple workaround for this, they also tend to waggle their fingers and tell you not to format your document like that in the first place; nonetheless, there it is in the DTD. MARC may be problematic in many ways, but this particular way isn’t one of them. Everything lives in a subfield.
- Inconsistent markup makes everything harder. When the earliest of these finding aids were marked up, I think there was also not a good sense of how these files would be used in the future, how essential it is to follow the same process every time, and how unseen code really matters on the web. When the metadata is not predictable, it becomes necesssary to either tinker with the end product, or build all sorts of exceptions into the transform. The same is true of human-mediated retrieval, but it’s especially problematic for an automated process that can only do what you build it to do and nothing more.