AgriCatch source released
Hello everyone!
Today is a great day to release some projects to the wild!
AgriCatch is a data aggregation tool I've built on top of Django.
What AgriCatch does is pretty simple - it lets you grab data from wild and disorganized websites (talking about the HTML of course).
It supports
- XPath - use XPath to find different fields
- Pagination - if you'd like to grab a list of things that are paginated, that's also possible
- Custom functions - if you'd like to do something special with the fields before saving them to DB
- Leftover event related functionality - originally AgriCatch was designed for events, because of that there's some leftover functionality in that matter:
- days_on_page - you can give the timespan within the events of a single page are included (for example if it's a weekly page - 7)
- start_day & num_of_days - you can give the importer a default starting day, it will then attempt to replace the url with a timestamp (according ot a format mentioned in the website template). Example:
http://www.example_events.com?date=%m-%d-%Y
Would then try to move forward days_on_page days until it reached the limit (num_of_days).
- HTML & XML support
Most of the useful documentation for building importers is found in the Repository under agricatch/website.py
To import you simply run the command:
python manage.py doimport website_name --days=7
website_name refers to the name of the website in lowercase!
More info to come..
Posted in
Technology