[School-of-data] Scraping using Google Refine (Workshop notes)

Oleg Lavrovsky oleg at utou.ch
Tue Jun 18 14:08:22 UTC 2013


Hi Michael,

Great initiative! I like the writeup and will be glad to review once "the
crowd clears". One thing I'd like to pick at is your opening example. At
least, if you scrape Wikipedia data I would mention the
Wikidata[1]  project, where they already have 13M+ points of article data
ready to be tapped into programmatically, or through an easy to use website
with open formats. Plus they have tools to convert Mediawiki markup without
trawling up HTML. It would make sense to help Wikidata refine the article,
while web pages like of Chilean Ministries[2] themselves could be geniunely
scrapeworthy. Just my 2 cts.

Cheers,
Oleg

[1] https://www.wikidata.org/
[2]
http://www.hacienda.cl/english/investor-relations-office/economics-statistics.html


On Tue, Jun 18, 2013 at 3:23 PM, Michael Bauer <michael.bauer at okfn.org>wrote:

> Hi,
>
> Got 30 minutes of time? Help us out!
>
> I used this walkthrough for a workshop yesterday:
>
> http://unurl.org/sccl
>
> It starts with scraping using google docs and later shows how to scrape
> multi-page documents with Refine.
>
> Since this is starting to circulate (thanks to @openrefine) - I'd like to
> turn it into a recipe/course soon.
>
> I need your help: if you have 30 minutes - read the doc and follow the
> instructions (especally from the section where refine is used) and comment
> on unclarities/ambiguities etc.
>
> Thank you,
>   Michael
>
> --
> Data Diva | skype: mihi_tr | @mihi_tr
> The Open Knowledge Foundation | School of Data
> http://okfn.org | http://schoolofdata.org
> GPG/PGP key: http://tentacleriot.eu/mihi.asc
>
> _______________________________________________
> School-of-data mailing list
> School-of-data at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/school-of-data
> Unsubscribe: http://lists.okfn.org/mailman/options/school-of-data
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/school-of-data/attachments/20130618/0e7694bb/attachment-0001.html>


More information about the school-of-data mailing list