[ddj] scraping data with a bookmarklet: Convextra

Adolfo Antón Bravo adanton at ucm.es
Fri Apr 12 11:02:24 UTC 2013


Hi

El vie, 12-04-2013 a las 11:01 +0200, mirko.lorenz at gmail.com escribió:
> With Needlebase you where able to define what needed to be scraped
> from an overview page (e.g. list of all parliament members in
> Germany), then you could define what links to follow and what fields
> to scrape. With a bit of effort you created a script for multipage
> scraping and could sent out the application to collect the defined
> information - all the results where then collected into a table. Neat.
> 
> 
> Kind of amazing that there was a good solution (as said in another
> mail: maybe a bit too good) and that it simply was taken down. There
> was no real communication why that happened, I had contacted the team
> directly even. 

Google and its business model is the "guilty":
http://www.reporterslab.org/needlebase-dead/

Salud!

> Point is: There would be an opportunity here, although I am not sure
> whether that could be sustainable. 

> 
> /Mirko
> 
> 2013/4/12 Michael Bauer <michael.bauer at okfn.org>
>         One thing that struck me interesting with Convextra is the
>         multi-page
>         scraping it does. This was unseen to me (never used needlebase
>         though).
>         
>         Michael
>         
>         On Thu, Apr 11, 2013 at 02:56:37PM +0200,
>         mirko.lorenz at gmail.com wrote:
>         > I wish we would have Needlebase back. Would have solved a
>         lot of issues,
>         > but was probably too good, e.g. it was possible to scrape
>         page by page with
>         > relative ease.
>         >
>         > 2013/4/11 <SMachlis at computerworld.com>
>         >
>         > > Agreed, although if you're only scraping a couple of pages
>         it's not too
>         > > much of a problem to select all, copy, and paste into a
>         local spreadsheet.
>         > >
>         > > ________________________________________
>         > >
>         > > Indeed and I would still recommend it over a purely web
>         based service. It
>         > > would be great if the scraper extension would allow local
>         saving - instead
>         > > of google docs export.
>         > >
>         > > Michael
>         > >
>         > >
>         > > _______________________________________________
>         > > data-driven-journalism mailing list
>         > > data-driven-journalism at lists.okfn.org
>         > >
>         http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>         > > Unsubscribe:
>         http://lists.okfn.org/mailman/options/data-driven-journalism
>         > >
>         
>         > _______________________________________________
>         > data-driven-journalism mailing list
>         > data-driven-journalism at lists.okfn.org
>         >
>         http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>         > Unsubscribe:
>         http://lists.okfn.org/mailman/options/data-driven-journalism
>         
>         
>         
>         --
>         Data Wrangler with the Open Knowledge Foundation (OKFN.org)
>         GPG/PGP key: http://tentacleriot.eu/mihi.asc
>         Twitter: @mihi_tr Skype: mihi_tr
>         
>         
>         _______________________________________________
>         data-driven-journalism mailing list
>         data-driven-journalism at lists.okfn.org
>         http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>         Unsubscribe:
>         http://lists.okfn.org/mailman/options/data-driven-journalism
>         
> 
> 
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism






More information about the data-driven-journalism mailing list