lieven at thedatatank.com
Sat Dec 17 09:24:01 UTC 2011
the documentation on http://thedatatank.com should get you started. We are
currently working on a Getting Started section for writing a scraper so I
will definitely keep you posted on that. Since we just released the
framework (on december 5th) you won't find a lot of information when you
Google on it yet but we are more then willing to give you an explanation on
how to use it.
We have chosen for PHP as The DataTank will most likely end up within an
environment where there are already PHP developers available. (Drupal,
As an ORM we use RedBeanPHP (it might happen that you still find some SQL
in the code base but eventually we will make the code base "SQL free")
XSLT is not used within the framework (you could use XSLT to extend the
framework with custom formatters though but that's completely up to the
developer's preferred choice)
For scrapers we most of the time use the PHP DOMDocument or Simple HTML DOM
and Reqular Expressions which give you similar possibilities as
Specifically for your case the most important features are:
- Logging of Usage Statistics
- Various output formats (JSON, XML, RDF, ...)
- Automated API Documentation
- RESTful API
- Query possibilities
- (we are currently working on a rich GUI which gives end-users the ability
to easily filter/sort/group/chart the data)
On the other hand I think you should choose the programming language you
feel the most comforable with and even when you scrape it in python we can
still put The DataTank in front of it to give you the features I mentioned.
After all our goal is opening up data not discussing which programming
language is the best ;-)
Hope this clarified some things for you.
2011/12/17 Laurent Peuch <psycojoker at gmail.com>
> > This really sounds like something for The DataTank to be useful for.
> > You'll not only get your 2JSON, but also 2XML and some other interfaces
> > for free.
> Well then, can you explain me in more details what really is DataTank
> and how it can help me scrape, store and organize the result of my
> work plz? I've tryed to talk to pieterc on irc to ask him the same
> question but he wasn't here.
> The documentation of the website is very poor and I can't understand
> how I'm supposed to use this and how it's supposed to help me :/
> Also, it seems to be php/sql/xslt which is, in my opinion and
> experience, inferiour from python/mongodb/BeautifulSoup for this work :/
> But maybe you have some awesome features hidden somewhere :)
> Laurent Peuch -- Bram
> okfn-be mailing list
> okfn-be at lists.okfn.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the okfn-be