[Okfn-ca] Fwd: [open-government] Examples of open data leading to increase in data quality?

Fri Aug 30 22:16:51 UTC 2013

As far as I am concerned, data should ideally be clean at the source.

Some information and sensibilisation have to be done.

Citizens give some time to work on open data should not have to bother 
with the data 'cleanliness'.

This is what I hope will be achieved, on the long run, with the SEAO data.

Instead of working afterwards, every months, on the same kind of 
problems, working with the Conseil du Trésor, I hope we will improve the 
data integrity. FOr instance, making some, before releasing the data, 
that fields having specifications (ex"; province code), do comply with 
the specs. Which is not the case at the moment.

Which brings me to a notion that I never seen: validation 
specifications. Data provider should explain what is done, if any, to 
verify the data before release.

In the SEAO case, I am building some code to make verifications when the 
file is released. Hopefully, those verifications will be made before the 
release and errors corrected.

This is changing the actuel dynamic where we are waiting fr the data and 
trying to do our best with what we have. It is time to go one step 
further or ahead and work with the orginasatios releasing data.

Pascal

Le 2013-08-30 15:11, Peder Jakobsen a écrit :
>
>
> On 2013-08-30, at 10:39 AM, Diane Mercier <diane.mercier at gmail.com 
> <mailto:diane.mercier at gmail.com>> wrote:
>
>> I would like to bring your attention tothis threadon the quality 
>> ofdata as published on the list [open-government].
>>
>> Ted Strauss focus there on the importance of " cleaning " datasets to 
>> improve their quality. This is certainly a major challenge for public 
>> organizations, since forty years, a multitude of information systems 
>> has proliferated and that without extended standardization and  
>> openness rules. In my opinion,this is adirect andserious consequence 
>> ofthe use ofproprietary softwarethat we inherit today.
>
> The important task is not cleaning the data, but extracting enough 
> meaningful fields from or associated with those records so you can 
> automate the creation of standard metadata information to be indexed 
> by a search engine, then to be delivered via an API.   This is the 
> core task of 99 % of all open data work on the planet at the moment.   
>   As long as the metadata is good, those with an incentive to use the 
> data will figure out a way to make sense of it.   If they don't, they 
> probably don't need the data all that badly (Economics 101)
>
> The vast majority of source code generated by the OKFN serves the 
> purpose of metadata creation.  Projects for cleaning data usually 
> whither and die on the vine, because you can't spin straw into gold 
> unless your name is Rumpelstiltskin
>
> Cleaning actual source data may be task as massive as curing  cancer 
> or putting and end to global warming, but the marginal benefit a 
> dollar spent on such an effort  is suspect, and probably unnecessary.
>
> Peder Jakobsen
> Ottawa
>
>
>
>
>
> _______________________________________________
> Okfn-ca mailing list | Open Knowledge Foundation Network - Groupe local au Canada | Local group in Canada
> Okfn-ca at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-ca
> Site Web | Website : http://ca.okfn.org
>
> More on | Encore plus sur : @okfnca
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-ca/attachments/20130830/625a8dec/attachment-0001.html>