parser

Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar. read more at WikiPedia

  • Some hours of hard work to find a workaround for this issue...I hope that it will help some of you
    as this simple issue should be quite common.

    Here is a sample xml that reveal the issue (sample.xml):

       1: <?xml version="1.0" encoding="UTF-8"?>
       2: <address xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       3:         xmlns="http://www.example.com/test"
       4:         xsi:schemaLocation="http://www.example.com/test sample.xsd">
       5:   <name>name</name>
       6:   <street>street</street>
       7:   <city>city</city>
       8:   <country>country</country>
       9: </address>

    A very simple XSD schema (sample.xsd)

       1: <?xml version="1.0" encoding="UTF-8" standalone="no"?>
       2: <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
       3:            targetNamespace="http://www.example.com/test"
       4:            xmlns="http://www.example.com/test">
       5:     <xs:element name="address">
       6:         <xs:complexType>
       7:             <xs:sequence>
       8:                 <xs:element name="name" type="xs:string" />
       9:                 <xs:element name="street" type="xs:string" />
      10:                 <xs:element name="city" type="xs:string" />
      11:                 <xs:element name="country" type="xs:string" />
      12:             </xs:sequence>
      13:         </xs:complexType>
      14:     </xs:element>
      15: </xs:schema>

    And a simple java client, using JUNIT4

       1: import java.io.InputStream;
       2: &160;
       3: import javax.xml.parsers.DocumentBuilder;
       4: import javax.xml.parsers.DocumentBuilderFactory;
       5: &160;
       6: import org.apache.commons.jxpath.JXPathContext;
       7: import org.junit.Assert;
       8: import org.junit.Test;
       9: import org.w3c.dom.Document;
      10: &160;
      11: public class JXpath12NameSpaceIssue{
      12: &160;
      13:   @Test
      14:   public void testCountNonWorkingXML() {
      15:     InputStream xmlStream = this.getClass().getResourceAsStream("/sample.xml");
      16: &160;
      17:     try {
      18:       JXPathContext context = this.getJXPathContext(xmlStream);
      19:       Double value = (Double)context.getValue("count(//name)");
      20:       Assert.assertEquals(1, value, 0.0);
      21:     } catch (Exception e) {
      22:       Assert.fail(e.getMessage());
      23:     }
      24:   }
      25: &160;
      26:   public JXPathContext getJXPathContext(InputStream inputStream) {
      27:     try {
      28:       DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
      29:       factory.setValidating(false); //This is for xml with DTD only!
      30:       factory.setNamespaceAware(true); //if namespace in xml, make no difference if true or false
      31:       factory.setFeature("http://apache.org/xml/features/validation/schema", true);
      32: &160;
      33:       DocumentBuilder builder = factory.newDocumentBuilder();
      34:       builder.setErrorHandler(new JXPathErrorHandler());
      35: &160;
      36:       Document document = builder.parse(inputStream);
      37:       JXPathContext context = JXPathContext.newContext(document);
      38: &160;
      39:       context.setLenient(true);
      40:       return context;
      41:     } catch (Throwable throwable) {
      42:       throwable.printStackTrace();
      43:     }
      44:     return null;
      45:   }

    &160;

    This line Double value = (Double)context.getValue("count(//name)"); will always make the
    test case fail, as the value of context.getValue("count(//name)");&160; is 0.0 instead of 1.0

    As soon as You remove the namespace from the XML file

       1: <?xml version="1.0" encoding="UTF-8"?>
       2: <address xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       3:         xmlns="http://www.example.com/test"
       4:         xsi:schemaLocation="http://www.example.com/test sample.xsd">

    &160;

    The code will return the correct value aka 1.0. The explanation has been found on internet thanks to Google

    From http://www.mail-archive.com/This email address is being protected from spambots. You need JavaScript enabled to view it./msg07865.html

    JXPath 1.2 handles namespaces somewhat differently from JXPath 1.1. It
    is following the XPath specification more closely. The specification
    describes the procedure of matching a name by comparing so-called
    expanded names. An expanded name is a combination of a local name and a
    namespace URI. In Quote from the spec: "Two expanded-names are equal if
    they have the same local part, and either both have a null namespace
    URI or both have non-null namespace URIs that are equal." The notion of
    default namespace applies to elements of an XML document, but does not
    apply to XPaths. Quote: "if the QName does not have a prefix, then the
    namespace URI is null (this is the same way attribute names are
    expanded). It is an error if the QName has a prefix for which there is
    no namespace declaration in the expression context

    To remedy the situation, do the following two things:

    1. Register the namespace with the JXPathContext:
    context.registerNamespace("schema", http://www.verticon.com/react2/schema;);

    Namespaces do not apply to objects, unless, of course, those objects are handled by
    custom NodePointers that are made namespace-aware.&160; The standard distribution of
    JXPath does not contain any such NodePointers. As far as the interpretation of XPaths
    on XML documents is concerned, we are bound by the XPath 1.0 standard.&160;
    On the other hand, the standard does not say anything about applying XPaths to any
    non-XML object models, therefore we were free to make pretty much arbitrary choices.&160;
    One of those choices was to ignore namespaces.

    More can also be read here

    When using namespaces, it is important to remember that XPath matches qualified
    names (QNames) based on the namespace URI, not on the prefix. Therefore the XPath
    "//foo:bar" may not find a node named "foo:bar" if the prefix "foo" in the context
    of the node and in the execution context of the XPath are mapped to different URIs.
    Conversely, "//foo:bar" will find the node named "biz:bar", if "foo" in the
    execution context and "biz" in the node context are mapped to the same URI.

    In order to use a namespace prefix with JXPath, that prefix should be known to
    JXPathContext. JXPathContext knows about namespace prefixes declared on the
    document element of the context node (the one passed to
    JXPathContext.newContext(node)), as well as the ones explicitly registered using
    the JXPathContext.registerNamespace(prefix, namespaceURI) method.

    So you should end up with:

       1: Document document = builder.parse(inputStream);
       2: JXPathContext context = JXPathContext.newContext(document);
       3: context.registerNamespace("schema", "http://www.example.com/test");
       4: Double value = (Double)context.getValue("count(//schema:name)");

    You can also go another way, and remove any namespace by forcing JAXB2 to
    create XML and DOM without qualified namespace

    In&160; com.example.xml.jaxb.package-info.java.package-info.java go from


    XmlNsForm.QUALIFIED
    to&160; XmlNsForm.UNQUALIFIED

       1: @javax.xml.bind.annotation.XmlSchema
       2: (namespace = "http://www.example.com/test",
       3: elementFormDefault = javax.xml.bind.annotation.XmlNsForm.UNQUALIFIED)
       4: package com.example.test;
  • Parser incompatibility or Parser order in Classpath,classloader, loading sequence in a highly complex and distributed environment.

    Especially in a distributed environment, order of parser found is a major problem during the walk of the JVM in classpath. For example, Different versions of the parser SAX are co-existing in the classpath due to components restrictions :
    • Parser sax 1.0 is required for BEA Weblogic 6.0 and Xerces 1.3.1
    • Parser sax 2.0 is required for Xerces 1.4 and crimson
    • Crimson is required for APACHE SOAP 2.2
    • You may want to use Apache soap client because it still accept to run with a JVM 1.2.2

    Try to draw a graph or parser dependancy: determine for all components in your application which DOM level is required.

    Solution 1: Only use one parser Apache Xerces (XML) and only one transformer Saxon (XST) or Apache Xalan (XST). It is not always possible since some 3rd party tools, like weblogic.jar or oracle.jar are coming with some crap inside...

    For example weblogic 6.0 contains a SAX 1.0 (http://edocs.bea.com/wls/docs61/notes/migrate60to61.html) But weblogic 6.1 a SAX 2.0!!!!!

    • Extract from BEA website:
      "Apache XML Parser
      The XML WebLogic Server 6.0 -> 6.1 parser has been updated and is now based on the Apache Xerces 1.3.1 parser. The parser implements version 2 of the SAX and DOM interfaces. Users who used older parsers that were shipped in previous versions may receive deprecation messages."
    • Solutions :
      - Displace the jar file in the classpath, so it will be load after the sax2 compliant parser....
    • And ?
      - Just for being confused, with no change at all, your code may be run if you use BEA Weblogic 6.1!!!!
    • What to retain ?
      - Order of packages loading is very important when working in JAVA.
      - If you are lost, and want to know when a package or class is loaded, you can start java with this parameter: -verbose the default output is System.out
      - You need to check carefully all preconditions before using a new 3rd party component, in order to see if you will not be incompatible with others 3rd party tools.

    Solution 2: Even if you have multiple parser in classpath, you can still force javax.xml factory to use the parser/transformer you want...

    Force Factory: create 3 files and put them in front of classpath in a directory META-INF/services/

    • File named: javax.xml.parsers.DocumentBuilderFactory may contains org.apache.xerces.jaxp.DocumentBuilderFactoryImpl(or org.apache.crimson.jaxp.DocumentBuilderFactoryImpl)
    • File named: javax.xml.parsers.SAXParserFactory may contains org.apache.xerces.jaxp.SAXParserFactoryImpl(or com.icl.saxon.aelfred.SAXParserFactoryImpl)
    • File named: javax.xml.transform.TransformerFactory may contains com.icl.saxon.TransformerFactoryImpl

    OR create 3 java variables when your start your java process:

    java -Djavax.xml.parsers.DocumentBuilderFactory=org.apache.xerces.jaxp.DocumentBuilderFactoryImpl -Djavax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl -Djavax.xml.transform.TransformerFactory=com.icl.saxon.TransformerFactoryImpl

    OR use system variables, in your code you can type, for each variable (not very flexible):

    System.setProperty("avax.xml.transform.TransformerFactory","com.icl.saxon.TransformerFactoryImpl")