Defining custom XPath functions on Python lxml
Parsing with lxml (Python library ) is a very good experience, especially compared to PHP.
An example advantage is that every element found by XPath is once again searchable
(whereas with PHP you can only provide it as a context, but you must provide the DOM ).
However, I came across a case where I needed to split the result to a list (split on python, explode on PHP).
This should be doable when using the "tokenize" function within the XPath query, but for some reason lxml doesn't support it (neither does PHP, I think it's an XPath 2.0 function ).
The solution : make your own custom functions!
|
|
This is basically it, you just have to make sure the function you're pointing to actually exists.
However, to make this a bit more useful, I decided to separate all my custom functions and pack them in another file (OOP FTW),
For clarity's sake I decided make all the helper functions start with the prefix "xpath_func_"
e.g. "xpath_func_remove_spaces", "xpath_func_replace_dollar_signs" and so on.
helper.py:
|
|
main.py
|
|
A working example can be found on this Github Repository