unimportant logo identifying Barry Cornelius

University of Oxford

Computing Services

Processing XML using Java

  Author: Barry Cornelius Date: 1st May 2001

Contents

1. XML

1.1. What is XML?

<table border=1>
   <tr>
      <td><b>inkjet cartridges</b></td>
      <td><em>HP Deskjet print cartridge 51645A</em></td>
      <td bgcolor="yellow">19.50</td>
   </tr>
...
</table>

<consumables>
   <product>
      <category>inkjet cartridges</category>
      <item>HP Deskjet print cartridge 51645A</item>
      <price>19.50</price>
   </product>
   ...
</consumables>

1.2. Some jargon

Here is some jargon:

<price>19.50</price>
is called an element. It is introduced by a start tag. In this example, the start tag is:
<price>
And it is terminated by an end tag:
</price>
A start tag may have one or more attributes as in:
<price units="pounds" country="UK">19.50</price>
An element may be empty as in:
<price amount="19.50"></price>
and this can be abbreviated to:
<price amount="19.50"/>
Another example is:
<unknown_price></unknown_price>
and this can be abbreviated to:
<unknown_price/>

1.3. The main goals of XML

As I see it, there are three main goals of XML:

  1. To provide a language that just describes some data. If it is necessary to describe how this data is to be displayed (and in some situations this is not the case), then this is described in a separate document.
  2. To allow the author to use their own tags. In this way, it is possible for an author to choose tags that are appropriate to the data.
  3. To insist on the language being used in a correct manner rather than being tolerant, e.g.,
    • each start tag must have an end tag;
    • the values of attributes must be quoted.

1.4. Well-formed, DTDs and valid

<!ELEMENT consumables (product*)>
<!ELEMENT product (category, item, price)>
<!ELEMENT category (#PCDATA)>
<!ELEMENT item (#PCDATA)>
<!ELEMENT price (#PCDATA)>

1.5. XML schemas

More recently, XML schemas have been introduced. They can be used to define the structure of some XML (instead of using a DTD). Two of the main differences are:

  1. An XML schema is written as an XML document (which means that there is less to learn and that it can be parsed by an XML parser).
  2. When producing an XML schema, not only can you use predefined types, your schema can also introduce new types that help you to define the structure of the XML.

1.6. Resources about XML

  1. Jon Bosak, 'XML, Java and the future of the Web', http://www.ibiblio.org/pub/sun-info/standards/xml/why/xmlapps.htm
  2. Elliotte Rusty Harold, 'XML Bible', IDG Books, 1999, 0-7645-3236-7.
  3. Charles Goldfarb and Paul Prescod, 'The XML Handbook', Prentice Hall, 2000, 0-13-014714-1.
  4. Brian E. Travis, 'XML and SOAP Programming for BizTalk Servers', Microsoft Press, 2000, 0-7356-1126-2.
  5. Some of the main WWW pages about XML are http://www.w3c.org/XML/, http://www.xml.com/, http://www.xml.org/, http://www.ibm.com/xml/, http://www.ibiblio.org/xml/, http://www.oasis-open.org/, http://www.oasis-open.org/cover/ and http://www.ucc.ie/xml/.

2. Some XML applications

  1. CDF, Channel Definition Format, http://msdn.microsoft.com/workshop/delivery/cdf/reference/CDF.asp.
  2. CML, Chemical Markup Language, http://www.xml-cml.org/.
  3. ebXML, an XML for electronic business, http://www.ebxml.org/.
  4. GeoTech-XML, Geotechnical Engineering, http://geotech.civen.okstate.edu/Gml/.
  5. MathML, Mathematical Markup Language, http://www.w3.org/Math/.
  6. SMIL, Synchronized Multimedia Integration Language, http://www.w3.org/AudioVideo/.
  7. SVG, Scaleable Vector Graphics, http://www.w3.org/Graphics/SVG/.
  8. TexML, http://www.alphaworks.ibm.com/tech/texml.
  9. XGL, http://www.xglspec.org/.
  10. XHTML, a rewrite of HTML as strict XML, http://www.w3.org/MarkUp/.

3. XSL Transformations

3.1. XSL

3.2. The consumables element

<xsl:template match="consumables">
   <html>
   <body>
   <table>
      <xsl:apply-templates/>
   </table>
   </body>
   </html>
</xsl:template>

3.3. The product element

<xsl:template match="product">
   <tr>
   <td><xsl:value-of select="category"/></td>
   <td><xsl:value-of select="item"/></td>
   <td><xsl:value-of select="price"/></td>
   </tr>
</xsl:template>

3.4. Other aspects of XSL

There are many other aspects to the use of XSL to transform documents. These include the possibilities of:

  1. re-ordering the processing of the elements of an XML document;
  2. looping over elements;
  3. deciding which elements to include;
  4. sorting with respect to some element.

4. Three approaches

Three approaches for processing an XML document will now be considered.

4.1. Approach One: using a tool to process XML

We could write a program to process an XML document. The program could be used to produce HTML which we could then make available on the WWW. More details about this are given in Section 5.

The problem with using a tool is that we have to remember to run it everytime the XML document is updated. The remaining two approaches enable the user of a browser to work directly on the XML document.

4.2. Approach Two: processing XML with a JavaServer page

If a webserver supports the running of server-side programs, then a browser visiting a page could trigger a webserver to run a program that reads an XML document and produces HTML. Although this could be done using a CGI program or by running a PHP script, Section 6 looks at how a Java program could be run by a webserver.

4.3. Approach Three: get the browser to process the XML

Increasingly, browsers are providing support for the use of XML and XSL. So Section 7 looks at how they are supported by Microsoft's Internet Explorer and Mozilla.

5. Using a standalone tool

5.1. Creating a tool in the Java programming language

XMLReader tXMLReader = new org.apache.xerces.parsers.SAXParser();

Handler tHandler = new Handler();
tXMLReader.setContentHandler(tHandler);
FileReader tFileReader = new FileReader("consumables.xml");
InputSource tInputSource = new InputSource(tFileReader);
tXMLReader.parse(tInputSource);

public class Handler extends DefaultHandler
{
   public void startElement(String pURI, String pLocalName,
                            String pQualifiedName, Attributes pAttributes)
   {
      System.out.println(pLocalName);
   }
}

5.2. Using a tool to create HTML from XML and XSL

5.3. Resources about SAX, DOM and XSLT

XML parsers (including the SAX and DOM APIs) and XSLT processors are available from:

  1. http://xml.apache.org/xerces-j/ and http://xml.apache.org/xalan/.
  2. http://msdn.microsoft.com/xml/default.asp.
  3. http://users.iclway.co.uk/mhkay/saxon/.
  4. http://www.jclark.com/xml/xp/ and http://www.jclark.com/xml/xt.html.

Some articles and books about SAX, DOM and XSLT include:

  1. David Megginson, 'SAX2: Quick Start', http://www.megginson.com/SAX/Java/quick-start.html
  2. Mazmul Idris, 'XML and Java Tutorial Part 1', http://www.developerlife.com/xmljavatutorial1/default.htm
  3. Brett McLaughlin, 'Java and XML', O'Reilly, 2000, 0-596-00016-2.
  4. Brett McLaughlin, 'All about JAXP: Sun's Java API for XML Parsing', http://www.ibm.com/software/developer/library/x-jaxp/index.html
  5. Dave Pawson, 'XSL Frequently Asked Questions', http://www.dpawson.co.uk/xsl/xslfaq.html
  6. Ken Holman, 'What is XSLT?', http://www.xml.com/pub/a/2000/08/holman/index.html
  7. Paul Sandoz, 'FOP Slide Kit', http://www.sun.com/software/xml/developers/fop/
  8. Doug Tidwell, 'Tutorial: Transforming XML documents [into HTML, SVG, and PDF]', http://www.ibm.com/xml/

6. Using JavaServer pages

6.1. Servlets

6.2. JavaServer pages

6.3. Using a JavaBean from a JSP

6.4. Using a tag library in a JSP

6.5. Resources about JSPs

  1. Sun's main page about JavaServer pages is at: http://java.sun.com/products/jsp/
  2. Jakarta's main page about tomcat is at: http://jakarta.apache.org/tomcat/
  3. 'JavaServer Pages Tag Libraries', http://java.sun.com/products/jsp/taglibraries.html
  4. 'Developing XML Solutions with JavaServer Pages Technology', http://java.sun.com/products/jsp/html/JSPXML.html
  5. Duane K. Fields and Mark A. Kolb, 'Web Development with JavaServer Pages', Manning Publications, 2000, 1-884777-99-6.

7. Support in browsers

As time moves on, WWW browsers play catch-up with new developments. We now look at the support for XML (and XSL) that is provided in recent versions of browsers: we look at version 5.5 of Internet Explorer (IE5.5) and Milestone 18 of Mozilla augmented by the code of Mozilla's XSLT project (M18XSLT). Milestone 18 was released in October 2000. Since the release of Mozilla 0.9.1 on 6th June 2001, the code of the XSLT project has formed part of Mozilla. The latest release of Mozilla can be downloaded from http://www.mozilla.org/releases/.

7.1. Use of CSS1 with XML

Both IE5.5 and M18XSLT provide support for XML with CSS1 (version 1 of Cascading Style Sheet). To use CSS, you need to augment an XML document with a xml-stylesheet processing instruction. This results in the text that is in the file http://www.dur.ac.uk/barry.cornelius/java/xml.processing/code/css1.xml.

<?xml version="1.0" standalone="no"?>
<?xml-stylesheet type="text/css" href="css1.css"?>
<!DOCTYPE consumables SYSTEM "consumables.dtd">
<consumables>
...
</consumables>
The type attribute of this xml-stylesheet processing instruction says that this XML document is to be transformed using a CSS stylesheet and the href attribute gives the URL of the file containing the actual CSS instructions.

The file css1.css might contain:

consumables { display: block }
product     { display: block }
category    { font-size: x-large; 
              color: black }
item        { font-size: large; 
              color: red }
price       { background-color: yellow; 
              color: blue }

7.2. Use of CSS2 with XML

CSS2 introduces some new aspects like the handling of tables. So instead we could use the file http://www.dur.ac.uk/barry.cornelius/java/xml.processing/code/css2.xml.

<?xml version="1.0" standalone="no"?>
<?xml-stylesheet type="text/css" href="css2.css"?>
<!DOCTYPE consumables SYSTEM "consumables.dtd">
<consumables>
...
</consumables>
where the file css2.css contains:
consumables { display: table }
product     { display: table-row }
category    { display: table-cell; font-size: x-large; 
              color: black }
item        { display: table-cell; font-size: large; 
              color: red; text-indent: 0.1in }
price       { display: table-cell; background-color: yellow; 
              color: blue; text-align: right; text-indent: 0.1in }

Although these instructions are obeyed correctly by M18XSLT, they are not understood by IE5.5.

7.3. Use of XSL with XML

Both IE5.5 and M18XSLT can use XSL to process an XML document. You need to insert the following xml-stylesheet processing instruction in your XML document:

<?xml-stylesheet type="text/xsl" href="consumables.xsl"?>

Inconveniently, there are two important differences in the XSL that is understood by IE5.5 and M18XSLT:

  1. IE5.5 requires a rule in the XSL stylesheet for processing the outermost level of an XML document. So IE5.5 needs the extra rule:
    <xsl:template match="/">
       <xsl:apply-templates/>
    </xsl:template>
    
  2. IE5.5 requires the following element to identify the namespace being used:
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
    
    whereas an M18XSLT XSL document requires:
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html"/>
    

This is because IE5.5 has an old version of Microsoft's XML Parser (MSXML). The WWW page http://www.netcrucible.com/xslt/msxml-faq.htm gives details of how you can obtain Version 3.0 of MSXML (November 2000) and get IE5.5 to use it.

The files http://www.dur.ac.uk/barry.cornelius/java/xml.processing/code/xsl.xml and http://www.dur.ac.uk/barry.cornelius/java/xml.processing/code/consumables.xsl contain the texts that can be used by both M18XSLT and the modified version of IE5.5.

8. Comparison of the three approaches

9. Examples in the curriculum

9.1. Computer Science projects during 1999-2000

9.2. Engineering projects during 2000-2001



Date: 1st May 2001 Author: Barry Cornelius.
This page is copyrighted