Defining custom XPath functions on Python lxml

Parsing with lxml (Python library ) is a very good experience, especially compared to PHP.
An example advantage is that every element found by XPath is once again searchable
(whereas with PHP you can only provide it as a context, but you must provide the DOM ).
However, I came across a case where I needed to split the result to a list (split on python, explode on PHP).

This should be doable when using the "tokenize" function within the XPath query, but for some reason lxml doesn't support it (neither does PHP, I think it's an XPath 2.0 function ).
The solution : make your own custom functions!

1
2
3
4
5
6
7
8
from lxml import etree
ns = etree.FunctionNamespace(None)
ns['tokenize'] = tokenize
def tokenize(context, string, split_token):
    if type(string) is list:
        string = string[0]
    return string.split(split_token)
#example: element.xpath('tokenize(//p,";")')

This is basically it, you just have to make sure the function you're pointing to actually exists.
However, to make this a bit more useful, I decided to separate all my custom functions and pack them in another file (OOP FTW),

For clarity's sake I decided make all the helper functions start with the prefix "xpath_func_"
e.g. "xpath_func_remove_spaces", "xpath_func_replace_dollar_signs" and so on.

helper.py:

1
2
3
4
5
6
7
8
def xpath_func_tokenize(context, string, split_token):
    if type(string) is list:
        string = string[0]
    return string.split(split_token)
def xpath_func_make_title(context, string):
    if type(string) is list:
        string = string[0]
    return string.title()

main.py

1
2
3
4
5
6
7
from lxml import etree
import helper
ns = etree.FunctionNamespace(None)
for helper_func in dir(helper):
    func_prefix = 'xpath_func_'
    if helper_func.startswith(func_prefix):
        ns[helper_func[len(func_prefix):]] = getattr(helper,helper_func)  #function's name is without the prefix

A working example can be found on this Github Repository

Asaf Zamir – Chief Technology Officer & AI/Cloud Consultant. Founder of CloudExpat, AZdev. Author of Coder to CTO.