revlis.nl

October 27, 2023 — 17:16

Author: silver Category: dev web Comments: Off

Xpath or “XML Path Language” is for querying XML and can also be used on HTML DOM. Like selecting ‘div’ elements with a specific class. This can be used when scraping webpages, e.g. with Selenium or Playwright. It works similar to CSS selectors.

Syntax and examples below are all xpath 1.0 since this version is always supported by tools and libs. Version 2.0 adds more types, functions and operators (there’s also 3.0 and 3.1).

Syntax

child:: (or '/') selects child (immediate)
descendant:: selects children (recursive)
descendant-or-self:: (or '//')
@ selects attribute
text() selects element text

Examples

Select div with ‘myclass’ and ‘title’ attribute

html: <div class="myclass" title="My Title>

xpath: //div[@class="myclass"]/@title

returns: ‘My Title’

Select link with #my_id and then text

html: ‘<a id="my_id">foo bar</a>’

xpath //a[@id="my_id"]/descendant::text()

returns: ‘foo bar’

Testing

Queries can be tested from CLI with ‘xmllint’ (apt install libxml2-utils)

# html file:
xmllint --html --xpath '//a[@class="Results"]/@title' example.html
# actual xml, from curl:
curl http://restapi.adequateshop.com/api/Traveler?page=1 | \
  xmllint --xpath '/TravelerinformationResponse/travelers/Travelerinformation/name -

Syntax

Examples

Select div with ‘myclass’ and ‘title’ attribute

Select link with #my_id and then text

Testing

More info