AgriCatch source released

Hello everyone!
Today is a great day to release some projects to the wild!

AgriCatch is a data aggregation tool I've built on top of Django.
What AgriCatch does is pretty simple - it lets you grab data from wild and disorganized websites (talking about the HTML of course).

It supports

XPath - use XPath to find different fields
Pagination - if you'd like to grab a list of things that are paginated, that's also possible
Custom functions - if you'd like to do something special with the fields before saving them to DB
Leftover event related functionality - originally AgriCatch was designed for events, because of that there's some leftover functionality in that matter:
- days_on_page - you can give the timespan within the events of a single page are included (for example if it's a weekly page - 7)
- start_day & num_of_days - you can give the importer a default starting day, it will then attempt to replace the url with a timestamp (according ot a format mentioned in the website template). Example:
  http://www.example_events.com?date=%m-%d-%Y
  Would then try to move forward days_on_page days until it reached the limit (num_of_days).
HTML & XML support

Most of the useful documentation for building importers is found in the Repository under agricatch/website.py
To import you simply run the command:

python manage.py doimport website_name --days=7

website_name refers to the name of the website in lowercase!

More info to come..

Asaf Zamir – Chief Technology Officer & AI/Cloud Consultant. Founder of CloudExpat, AZdev. Author of Coder to CTO.