[School-of-data] [ddj] Geocoding tutorials on the School of Data Blog

Tom Morris tfmorris at gmail.com
Sat Nov 8 16:30:03 UTC 2014


Since it's only two pages, I'd just cut the JSON out of the source display
and paste it into OpenRefine's Create Project from Clipboard dialog.  In 10
minutes you can have all the data from the main page parsed out.  If you
need the narrations block from the detail pages, you can set it up to fetch
and parse the additional HTML.

I've attached the CSV file and the OpenRefine project for the first page.
You can review the Undo history in OpenRefine to see what I did (and
perhaps even extract & reapply it to the data for the second page if it's
in the same format).

Tom

On Fri, Nov 7, 2014 at 1:05 PM, Idoia Sota <idoiasota at gmail.com> wrote:

> Dear All,
>
>     I'm trying to scrape this map with no success at all:
> https://www.icc-ccs.org/piracy-reporting-centre/live-piracy-map/piracy-map-2013
> and https://www.icc-ccs.org/piracy-reporting-centre/live-piracy-map
>
>      There are two levels of information in it: the one on the tooltip and
> the one on the link that appears on the tooltip.
>
>      I've tryed ScraperWiki (Json), but it gives me an error (I don't even
> know if it makes sense to use it). And then tryed to code on scraperwiki,
> but getting the html code gave this error (image attached). I seems I need
> to have some certificate for the page. Nevertheless, I can see all the data
> when I clic on "see the html code of this page". (image 2 attached)
>
>    Can anybody tell me what would it be the best to do with this? Can you
> help me? Thank you so much!
>
> Idoia
>
>
>
>
> 2013-02-19 14:55 GMT+01:00 Lucy Chambers <lucy.chambers at okfn.org>:
>
>> Hi All,
>>
>> If anyone has ever wanted to know how to convert simple place names in a
>> spreadsheet to lat and long values so that they can put their data on a
>> map, Rufus Pollock has just put up a couple of tutorials on the School of
>> Data blog.
>>
>> An introduction to Geocoding:
>> http://schoolofdata.org/2013/02/19/geocoding-part-i-introduction-to-geocoding/
>>
>> Geocoding in a Google Docs Spreadsheet:
>> http://schoolofdata.org/2013/02/19/geocoding-part-ii-geocoding-data-in-a-google-docs-spreadsheet/
>>
>>
>> We'll be looking to port them over to the School of Data Handbook in the
>> near future, so please let us know what you think of them (feel free to use
>> the blog comments or the mailing lists!).
>>
>> More soon,
>>
>> Lucy
>>
>> _______________________________________________
>> data-driven-journalism mailing list
>> data-driven-journalism at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>>
>>
>
> _______________________________________________
> school-of-data mailing list
> school-of-data at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/school-of-data
> Unsubscribe: https://lists.okfn.org/mailman/options/school-of-data
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/school-of-data/attachments/20141108/e0cab7aa/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: https-www-icc-ccs-org-piracy-reporting-centre-live-piracy-map.csv
Type: text/csv
Size: 100749 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/school-of-data/attachments/20141108/e0cab7aa/attachment-0002.csv>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: https-www-icc-ccs-org-piracy-reporting-centre-live-piracy-map.openrefine.tar.gz
Type: application/x-gzip
Size: 23950 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/school-of-data/attachments/20141108/e0cab7aa/attachment-0002.bin>


More information about the school-of-data mailing list