[ddj] scraping data with a bookmarklet: Convextra

mirko.lorenz at gmail.com mirko.lorenz at gmail.com
Fri Apr 12 09:01:00 UTC 2013


With Needlebase you where able to define what needed to be scraped from an
overview page (e.g. list of all parliament members in Germany), then you
could define what links to follow and what fields to scrape. With a bit of
effort you created a script for multipage scraping and could sent out the
application to collect the defined information - all the results where then
collected into a table. Neat.

Kind of amazing that there was a good solution (as said in another mail:
maybe a bit too good) and that it simply was taken down. There was no real
communication why that happened, I had contacted the team directly even.

Point is: There would be an opportunity here, although I am not sure
whether that could be sustainable.

/Mirko

2013/4/12 Michael Bauer <michael.bauer at okfn.org>

> One thing that struck me interesting with Convextra is the multi-page
> scraping it does. This was unseen to me (never used needlebase though).
>
> Michael
>
> On Thu, Apr 11, 2013 at 02:56:37PM +0200, mirko.lorenz at gmail.com wrote:
> > I wish we would have Needlebase back. Would have solved a lot of issues,
> > but was probably too good, e.g. it was possible to scrape page by page
> with
> > relative ease.
> >
> > 2013/4/11 <SMachlis at computerworld.com>
> >
> > > Agreed, although if you're only scraping a couple of pages it's not too
> > > much of a problem to select all, copy, and paste into a local
> spreadsheet.
> > >
> > > ________________________________________
> > >
> > > Indeed and I would still recommend it over a purely web based service.
> It
> > > would be great if the scraper extension would allow local saving -
> instead
> > > of google docs export.
> > >
> > > Michael
> > >
> > >
> > > _______________________________________________
> > > data-driven-journalism mailing list
> > > data-driven-journalism at lists.okfn.org
> > > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> > > Unsubscribe:
> http://lists.okfn.org/mailman/options/data-driven-journalism
> > >
>
> > _______________________________________________
> > data-driven-journalism mailing list
> > data-driven-journalism at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> > Unsubscribe:
> http://lists.okfn.org/mailman/options/data-driven-journalism
>
>
> --
> Data Wrangler with the Open Knowledge Foundation (OKFN.org)
> GPG/PGP key: http://tentacleriot.eu/mihi.asc
> Twitter: @mihi_tr Skype: mihi_tr
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130412/c043c7a5/attachment-0001.html>


More information about the data-driven-journalism mailing list