Feedback from the problems uncovered deploying XML applications drives the evolution of the W3C standards. New versions of the standards solve real problems. Thus the migration of code to new versions of XML support may be driven by necessity rather than a desire to pick up neat new features. Applications that are centered entirely on XML (plausibly, uPortal) are forced to keep up to date.
Things would be simpler if the W3C produced new standards that were compatible with their previous standards. Unfortunately, they have adopted a policy of replacing the definition of each interface with new versions of the same interface name with additional methods. (Think, transition from DOM 2 to DOM 3 resulted in new methods to the same Interface, org.w3c.dom.Document). This means that the bundle of interfaces (associated with one version of the standard) are tightly coupled to a separate Jar file containing versions of the implementing classes that support the new methods.
One of the basic programming interfaces is the DOM (Document Object Model). The DOM interfaces are defined by packages of the form org.w3c.dom.* and define a set of objects and methods that provide operations on the objects. A DOM 2 standard was developed years ago, and DOM 3 component standards are now released. Driven by requirements emerging from layout management implementations, uPortal requires DOM 3 support.
The Apache Xerces project was formed from submissions from IBM (XML4J) and Sun (ProjectX). It represents a common codebase to which all parties can submit bugfixes and new features. Apache distributes versions of Xerces directly, but Sun distributes the a version of the same code with slightly different packaging.
Starting with Java 1.4, Sun decided that XML was so important that it should be a standard part of the J2SE runtime library. However, Sun's standards require that all XML requests filter through the JAXP API, just as all database request go through JDBC and all directory requests go through JNDI. The Apache code contains some programming interfaces with concrete classes left over from the old IBM XML4J days. So although Sun's distribution is based on Apache Xerces, they tend to rename some of the classes to require everyone to go through the public JAXP interface.
Unfortunately, Sun decided to freeze the features and standards at major release boundaries. When Java 1.4.0 came out in Feb. 2002, the standards were DOM 2 and JAXP 1.2. So although bugs were fixed, these versions of the standards remained the basis for the Sun library through releases of 1.4.1 and 1.4.2 (up to 1.4.2_06). The only way to override this type of built in function is to use the "endorsed" library function of Java, and the only other version of code reasonably available was the distribution from Apache.
The current version of the Xerces XML support distributed by Apache contain interface definitions based on the old DOM 2 standard, and classes that implement that standard. Apache provides an Ant build option to create a version of its current Xerces release with the DOM 3 interfaces and implementations, but it regards a library built this way to be experimental Beta code. The plan is to convert to DOM 3 support in the 2.7.0 release of Xerces, which currently has no planned release date.
In the Summer of 2004, Sun finally released a new major release. Designated as 1.5 under the old system, or as J2SE 5.0 in a new naming convention, this release includes as standard both support for DOM 3 and JAXP 1.3. In November they also released a version of the same XML library for use on earlier Java releases.
So at this moment, Sun has leapfrogged ahead of Apache. Eventually Apache will relase 2.7.0 and catch up, but even then the Sun version of the code will have the advantage that it is built into Java (at least if you are running J2SE 5.0). It provides all the function needed for OpenSAML and Shibboleth, and some useful new features, but will require some conversion.
The proposal is to convert the uPortal project to use the new Sun version of these libraries rather than the older Apache version. If a customer is using J2SE 5.0 as his JRE, then no libraries are needed and everything will work with just the standard Java runtime. For older JREs, then the five Sun jar files replace the previously distributed two Apache jar files in the /endorsed library.
This enables converting some existing code to use the JAXP factory standard instead of using the uPortal DocumentFactory API. This has the benefit that code written for uPortal can more easily be picked up and used in other environments – dependency is directly upon standard libraries rather than upon uPortal-specific APIs. This goes the other direction too – code written to these standard APIs can be picked up and dropped into uPortal. An additional benefit to the conversion is that XSD schema files can become first class programming objects.
A customer who uses J2SE 5.0 as his JRE (and a Servlet container such as Tomcat 5.5 that supports it) has the desired level of XML support and requires no libraries.
A customer using some version of Java 1.4.x requires the Sun distribution of new XML support for old Java systems. If this is checked into the current OpenSAML and Shibboleth projects, the /endorsed directory would now have five Jar files replacing the previous two jar files:
- dom.jar (contains the org.w3c.dom interface packages)
- sax.jar (contains the org.xml.sax interface packages)
- jaxp-api.jar (contains the javax.xml interface packages)
- xercesImpl.jar (Xerces, but with the packages renamed as com.sun.org.apache.xerces...)
- xalan.jar (Xalan, but with the packages renamed as com.sun.org.apache.xalan...)
Essentially, Sun breaks the Apache xml-apis library of interfaces into three separate Jar files representing the three different interface standards (DOM, SAX, and JAXP) from three separate organizations. This seems like a sensible piece of housekeeping.
The implementing classes (org.apache...) then have their packages renamed to com.sun.org.apache... Direct use of Apache implementing classes bypasses JAXP. It is essentially the same thing as using an Oracle database class directly instead of going through JDBC. Since Sun has to maintain the same classes as Apache, they did not want to change the source. However, by renaming the packages they could be sure that any code that makes direct use of an Apache class would have to be converted.
DocumentBuilderFactory (not "new DOMParser", not org.jasig.portal.utils.DocumentFactory.getNewDocument())
The Sun approach to functional libraries is to create a factory interface with pluggable providers. JAXP is the factory interface for XML. Sun provides a set of implementing classes, but I suppose you might find an alternate source of classes to implement one or more of the XML standards.
Apache used to expose some concrete classes to perform specific functions. Some Shibboleth and OpenSAML source includes the following statement to define the concrete class that provides XML to DOM parsing:
Sun doesn't want you to use direct classes, so it renamed the packages. There is still a DOMParser class, but when Sun distributes it it is com.sun.org.apache.xerces.internal.parsers.DOMParser. If you convert from Apache to Sun libraries, then the old import statements and direct use of DOMParser and a few other concrete classes will not compile.
To correct such statements, replace the direct use of classes with the JAXP factory interface. The first step is to create a DocumentBuilderFactory object. This object is then parameterized with information about the type of XML parser you want (especially the XSD Schemas it should use). Then, the DocumentBuilderFactory can be called to create one or more DocumentBuilder objects. DocumentBuilder is almost the same as DOMParser, though a few method details are different.
There is a similar Transformer factory interface to get an object that will convert DOM back to a string of characters (serialize the XML).
Although there are some rough one-to-one translations between old classes and new factories, the details of methods and properties are important. The existing code contains some optimizations, and the same things need to be expressed with a new semantic.