Xpath or “XML Path Language” is for querying XML and can also be used on HTML DOM. Like selecting ‘div’ elements with a specific class. This can be used when scraping webpages, e.g. with Selenium or Playwright. It works similar to CSS selectors.
Syntax and examples below are all xpath 1.0 since this version is always supported by tools and libs. Version 2.0 adds more types, functions and operators (there’s also 3.0 and 3.1).
'/') selects child (immediate)
descendant::selects children (recursive)
text()selects element text
Select div with ‘myclass’ and ‘title’ attribute
<div class="myclass" title="My Title>
returns: ‘My Title’
Select link with #my_id and then text
<a id="my_id">foo bar</a>’
returns: ‘foo bar’
Queries can be tested from CLI with ‘xmllint’ (
apt install libxml2-utils)
# html file: xmllint --html --xpath '//a[@class="Results"]/@title' example.html # actual xml, from curl: curl http://restapi.adequateshop.com/api/Traveler?page=1 | \ xmllint --xpath '/TravelerinformationResponse/travelers/Travelerinformation/name -