[open-government] [euopendata] Idea: screen scraping sprint

Tim McNamara paperless at timmcnamara.co.nz
Fri Oct 1 19:31:49 UTC 2010


>
>
> On Fri, Oct 1, 2010 at 12:50 PM, Antti Poikola <antti.poikola at gmail.com>
wrote:

> >  Hi people,
> >
> > I just got an idea that we could organize maybe even global screen
> scraping
> > day/camp/sprint with the idea to create open API:s to government data by
> > programming screen scrapers to existing public, but technically not open
> > data sources?
> >
> > So far just a vague idea... anybody interested to brainstorm it further?
> >
> > -Jogi
>

On 2 October 2010 05:04, Jonathan Gray <jonathan.gray at okfn.org> wrote:

> Great idea, Antti!
>
> Sounds like something that the OKF would be very interested in
> supporting. Also we should *definitely* talk to Scraper Wiki about
> this!


Interesting idea.

Before redistributing data,  postprocessing has been necessary in the data
sources that I've encountered. Spreadsheets from New Zealand departments
typically have mixed numeric and text types, etc, silly headers and other
oddities. I assume that means their own analysts are using dirty data,
however they seem fairly unresponsive when I say "would you like to replace
your current spreadsheet with a cleaner dataset that I've produced?".

I think that being part of a larger body/brand, such as OKFN, will assist
with receiving buy in from agencies.

I'm a contributor to the Google Refine (nee Freebase Gridworks). It's a
pretty good tool for dealing with these kinds of things. Perhaps scrapers
could add documents to a request queue for processing, then this queue could
be crowdsourced by data analysts?

Tim McNamara
  Masters Candidate in Public Policy, Victoria University of Wellington
  Participant, Open New Zealand
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-government/attachments/20101002/0f5e77f0/attachment-0001.html>


More information about the open-government mailing list