[ddj] scraping data with a bookmarklet: Convextra
Adolfo Antón Bravo
adanton at ucm.es
Fri Apr 12 11:02:24 UTC 2013
Hi
El vie, 12-04-2013 a las 11:01 +0200, mirko.lorenz at gmail.com escribió:
> With Needlebase you where able to define what needed to be scraped
> from an overview page (e.g. list of all parliament members in
> Germany), then you could define what links to follow and what fields
> to scrape. With a bit of effort you created a script for multipage
> scraping and could sent out the application to collect the defined
> information - all the results where then collected into a table. Neat.
>
>
> Kind of amazing that there was a good solution (as said in another
> mail: maybe a bit too good) and that it simply was taken down. There
> was no real communication why that happened, I had contacted the team
> directly even.
Google and its business model is the "guilty":
http://www.reporterslab.org/needlebase-dead/
Salud!
> Point is: There would be an opportunity here, although I am not sure
> whether that could be sustainable.
>
> /Mirko
>
> 2013/4/12 Michael Bauer <michael.bauer at okfn.org>
> One thing that struck me interesting with Convextra is the
> multi-page
> scraping it does. This was unseen to me (never used needlebase
> though).
>
> Michael
>
> On Thu, Apr 11, 2013 at 02:56:37PM +0200,
> mirko.lorenz at gmail.com wrote:
> > I wish we would have Needlebase back. Would have solved a
> lot of issues,
> > but was probably too good, e.g. it was possible to scrape
> page by page with
> > relative ease.
> >
> > 2013/4/11 <SMachlis at computerworld.com>
> >
> > > Agreed, although if you're only scraping a couple of pages
> it's not too
> > > much of a problem to select all, copy, and paste into a
> local spreadsheet.
> > >
> > > ________________________________________
> > >
> > > Indeed and I would still recommend it over a purely web
> based service. It
> > > would be great if the scraper extension would allow local
> saving - instead
> > > of google docs export.
> > >
> > > Michael
> > >
> > >
> > > _______________________________________________
> > > data-driven-journalism mailing list
> > > data-driven-journalism at lists.okfn.org
> > >
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> > > Unsubscribe:
> http://lists.okfn.org/mailman/options/data-driven-journalism
> > >
>
> > _______________________________________________
> > data-driven-journalism mailing list
> > data-driven-journalism at lists.okfn.org
> >
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> > Unsubscribe:
> http://lists.okfn.org/mailman/options/data-driven-journalism
>
>
>
> --
> Data Wrangler with the Open Knowledge Foundation (OKFN.org)
> GPG/PGP key: http://tentacleriot.eu/mihi.asc
> Twitter: @mihi_tr Skype: mihi_tr
>
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe:
> http://lists.okfn.org/mailman/options/data-driven-journalism
>
>
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
More information about the data-driven-journalism
mailing list