Sunday, January 27, 2013

XPath and Java...

Introduction:

XPath is an XML technology which is used to retrieve element from XML documents. Since XML documents are structured, XPath expression can be used to locate and retrieve elements, attributes or value from XML files. XPath is similar to SQL in terms of retrieving data from XML but it has it’s own syntax and rules.
As XML is most widely used for data storage or data transfer. So we require XPath to quickly fetch data from bulky XMLs. So Java 5 introduced the javax.xml.xpath package to provide an engine and object-model independent XPath library. Among other products, Xalan 2.7 and Saxon 8 include an implementation of this library.

How it Works with Java:

Lets look at one example to understand it. Here is an XML with some books data.
We need to search all books of some specific author or specific publication. To achieve this we can write some parser and get the desired results but that seems to be a lot of work and there are chances when we have to rewrite code for small change in requirement. So here comes XPath in picture.

In Java, here are few simple steps to use XPath:
1. Get XPath object from XPathFactory:

            XPathFactory xPathfactory = XPathFactory.newInstance();
     XPath xpath = xPathfactory.newXPath();


2. Generate XPathExpression:
      XPathExpression expr = xpath.compile(xpathVal);

3. Evaluate XpathExpression:
      NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

       Evaluate method could return different datatype on basis of second Argument, Possible values are:
  1. XPathConstants.NODESET
  2. XPathConstants.NODE
  3. XPathConstants.BOOLEAN
  4. XPathConstants.NUMBER
  5. XPathConstants.STRING

Here is example implementation for different cases:

Quick Tips:


  • To filter the nodes on basis of  attribute value you can use [@attribute_name='attribute_value']
  • To filter nodes on basis of child node use [node_name='node_value']
  • To get only values in result use "text()" at the end
  • To get the nodes in result use wildcard character "*", Wildcard can be used in between the expression as well.

References:


No comments:

Post a Comment