Introduction:
XPath is an XML technology which is used to retrieve element from XML documents. Since XML documents are structured, XPath expression can be used to locate and retrieve elements, attributes or value from XML files. XPath is similar to SQL in terms of retrieving data from XML but it has it’s own syntax and rules.As XML is most widely used for data storage or data transfer. So we require XPath to quickly fetch data from bulky XMLs. So Java 5 introduced the
javax.xml.xpath
package to provide an engine and object-model independent XPath library. Among other products, Xalan 2.7 and Saxon 8 include an implementation of this library.How it Works with Java:
Lets look at one example to understand it. Here is an XML with some books data.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> | |
<ns2:bookstore xmlns:ns2="learn.java.xml"> | |
<bookList> | |
<book author="author1"> | |
<title>title1</title> | |
<publisher>publisher1</publisher> | |
<isbn>001-0000001</isbn> | |
</book> | |
<book author="author2"> | |
<title>title3</title> | |
<publisher>publisher1</publisher> | |
<isbn>002-0000001</isbn> | |
</book> | |
<book author="author1"> | |
<title>title2</title> | |
<publisher>publisher1</publisher> | |
<isbn>001-0000002</isbn> | |
</book> | |
<book author="author1"> | |
<title>title4</title> | |
<publisher>publisher2</publisher> | |
<isbn>001-0000004</isbn> | |
</book> | |
<book author="author2"> | |
<title>title5</title> | |
<publisher>publisher1</publisher> | |
<isbn>002-0000005</isbn> | |
</book> | |
</bookList> | |
</ns2:bookstore> |
In Java, here are few simple steps to use XPath:
1. Get XPath object from XPathFactory:
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
2. Generate XPathExpression:
XPathExpression expr = xpath.compile(xpathVal);
3. Evaluate XpathExpression:
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
Evaluate method could return different datatype on basis of second Argument, Possible values are:
- XPathConstants.NODESET
- XPathConstants.NODE
- XPathConstants.BOOLEAN
- XPathConstants.NUMBER
- XPathConstants.STRING
Here is example implementation for different cases:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package learn.java.xml; | |
import java.io.IOException; | |
import java.util.logging.Level; | |
import java.util.logging.Logger; | |
import javax.xml.parsers.DocumentBuilder; | |
import javax.xml.parsers.DocumentBuilderFactory; | |
import javax.xml.parsers.ParserConfigurationException; | |
import javax.xml.xpath.XPath; | |
import javax.xml.xpath.XPathConstants; | |
import javax.xml.xpath.XPathExpression; | |
import javax.xml.xpath.XPathExpressionException; | |
import javax.xml.xpath.XPathFactory; | |
import org.w3c.dom.Document; | |
import org.w3c.dom.Node; | |
import org.w3c.dom.NodeList; | |
import org.xml.sax.SAXException; | |
public class LearnXPath { | |
private static final String XML_FILE_PATH = "src/learn/java/xml/bookstore.xml"; | |
private static final String XPATH_NODE[] = { | |
"/bookstore/bookList/book[@author='author1']/isbn", //attribute search - absolute node path | |
"//book[@author='author1']/isbn", //search using attribute value without absolute node path | |
"//book[publisher='publisher1']/isbn",//search using node value | |
"//book[@author='author1']/*" //search using attribute receieve all nodes | |
}; | |
private static final String XPATH_STRING[] = { | |
"/bookstore/bookList/book[@author='author2']/isbn/text()", | |
"//book[@author='author2']/isbn/text()", | |
"//book[publisher='publisher2']/isbn/text()" | |
}; | |
public static void main(String... args) { | |
new LearnXPath().doMain(args); | |
} | |
private void doMain(String... args) { | |
try { | |
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); | |
DocumentBuilder builder = factory.newDocumentBuilder(); | |
Document doc = builder.parse(XML_FILE_PATH); | |
//get Node List from XML using XPATH | |
for (int i = 0; i < XPATH_NODE.length; i++) { | |
getNodeListByXPath(doc, XPATH_NODE[i]); | |
} | |
//get String value from XML using XPATH | |
for (int i = 0; i < XPATH_STRING.length; i++) { | |
getStringByXPath(doc, XPATH_STRING[i]); | |
} | |
} catch (SAXException | IOException | ParserConfigurationException ex) { | |
Logger.getLogger(LearnXPath.class.getName()).log(Level.SEVERE, null, ex); | |
} | |
} | |
private void getNodeListByXPath(Document doc, String xpathVal) { | |
try { | |
XPathFactory xPathfactory = XPathFactory.newInstance(); | |
XPath xpath = xPathfactory.newXPath(); | |
XPathExpression expr = xpath.compile(xpathVal); | |
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET); | |
Logger.getLogger(LearnXPath.class.getName()).log(Level.INFO, "Number of nodes = {0}", nl.getLength()); | |
StringBuilder sb = new StringBuilder(); | |
for (int i = 0; i < nl.getLength(); i++) { | |
Node node = nl.item(i); | |
sb.append(node.getFirstChild().getNodeValue()).append(",\t"); | |
} | |
Logger.getLogger(LearnXPath.class.getName()).log(Level.INFO, "ISBN Code = {0}", sb); | |
} catch (XPathExpressionException ex) { | |
Logger.getLogger(LearnXPath.class.getName()).log(Level.SEVERE, null, ex); | |
} | |
} | |
private void getStringByXPath(Document doc, String xpathVal) { | |
try { | |
XPathFactory xPathfactory = XPathFactory.newInstance(); | |
XPath xpath = xPathfactory.newXPath(); | |
XPathExpression expr = xpath.compile(xpathVal); | |
String isbn = (String) expr.evaluate(doc, XPathConstants.STRING); | |
Logger.getLogger(LearnXPath.class.getName()).log(Level.INFO, "ISBN Code = {0}", isbn); | |
} catch (XPathExpressionException ex) { | |
Logger.getLogger(LearnXPath.class.getName()).log(Level.SEVERE, null, ex); | |
} | |
} | |
} |
Quick Tips:
- To filter the nodes on basis of attribute value you can use [@attribute_name='attribute_value']
- To filter nodes on basis of child node use [node_name='node_value']
- To get only values in result use "text()" at the end
- To get the nodes in result use wildcard character "*", Wildcard can be used in between the expression as well.