[School-of-data] Scraping using Google Refine (Workshop notes)

Michael Bauer michael.bauer at okfn.org
Tue Jun 18 14:24:22 UTC 2013


Oleg,

You are perfectly right there - will mention Wikidata. Just took the
wikipedia page as a starter. Scraping different tables later on.

Michael

On Tue, Jun 18, 2013 at 04:08:22PM +0200, Oleg Lavrovsky wrote:
> Hi Michael,
> 
> Great initiative! I like the writeup and will be glad to review once "the
> crowd clears". One thing I'd like to pick at is your opening example. At
> least, if you scrape Wikipedia data I would mention the
> Wikidata[1]  project, where they already have 13M+ points of article data
> ready to be tapped into programmatically, or through an easy to use website
> with open formats. Plus they have tools to convert Mediawiki markup without
> trawling up HTML. It would make sense to help Wikidata refine the article,
> while web pages like of Chilean Ministries[2] themselves could be geniunely
> scrapeworthy. Just my 2 cts.
> 
> Cheers,
> Oleg
> 
> [1] https://www.wikidata.org/
> [2]
> http://www.hacienda.cl/english/investor-relations-office/economics-statistics.html
> 
> 
> On Tue, Jun 18, 2013 at 3:23 PM, Michael Bauer <michael.bauer at okfn.org>wrote:
> 
> > Hi,
> >
> > Got 30 minutes of time? Help us out!
> >
> > I used this walkthrough for a workshop yesterday:
> >
> > http://unurl.org/sccl
> >
> > It starts with scraping using google docs and later shows how to scrape
> > multi-page documents with Refine.
> >
> > Since this is starting to circulate (thanks to @openrefine) - I'd like to
> > turn it into a recipe/course soon.
> >
> > I need your help: if you have 30 minutes - read the doc and follow the
> > instructions (especally from the section where refine is used) and comment
> > on unclarities/ambiguities etc.
> >
> > Thank you,
> >   Michael
> >
> > --
> > Data Diva | skype: mihi_tr | @mihi_tr
> > The Open Knowledge Foundation | School of Data
> > http://okfn.org | http://schoolofdata.org
> > GPG/PGP key: http://tentacleriot.eu/mihi.asc
> >
> > _______________________________________________
> > School-of-data mailing list
> > School-of-data at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/school-of-data
> > Unsubscribe: http://lists.okfn.org/mailman/options/school-of-data
> >

> _______________________________________________
> School-of-data mailing list
> School-of-data at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/school-of-data
> Unsubscribe: http://lists.okfn.org/mailman/options/school-of-data


-- 
Data Diva | skype: mihi_tr | @mihi_tr
The Open Knowledge Foundation | School of Data
http://okfn.org | http://schoolofdata.org 
GPG/PGP key: http://tentacleriot.eu/mihi.asc




More information about the school-of-data mailing list