Xpath
October 27, 2023 — 17:16

Author: silver  Category: dev web  Comments: Off

Xpath or “XML Path Language” is for querying XML and can also be used on HTML DOM. Like selecting ‘div’ elements with a specific class. This can be used when scraping webpages, e.g. with Selenium or Playwright. It works similar to CSS selectors.

Syntax and examples below are all xpath 1.0 since this version is always supported by tools and libs. Version 2.0 adds more types, functions and operators (there’s also 3.0 and 3.1).

Syntax

  • child:: (or '/') selects child (immediate)
  • descendant:: selects children (recursive)
  • descendant-or-self:: (or '//')
  • @ selects attribute
  • text() selects element text

Examples

Select div with ‘myclass’ and ‘title’ attribute

html: <div class="myclass" title="My Title>

xpath: //div[@class="myclass"]/@title

returns: ‘My Title’

Select link with #my_id and then text

html: ‘<a id="my_id">foo bar</a>

xpath //a[@id="my_id"]/descendant::text()

returns: ‘foo bar’

Testing

Queries can be tested from CLI with ‘xmllint’ (apt install libxml2-utils)

# html file:
xmllint --html --xpath '//a[@class="Results"]/@title' example.html
# actual xml, from curl:
curl http://restapi.adequateshop.com/api/Traveler?page=1 | \
  xmllint --xpath '/TravelerinformationResponse/travelers/Travelerinformation/name -

More info








We use Matomo free and open source web analytics
We also use Jetpack WordPress.com Stats which honors DNT