[Open-data-census] open-data-census Digest, Vol 18, Issue 3

Pierre Chrzanowski pierre.chrzanowski at gmail.com
Wed Nov 5 15:14:33 UTC 2014


Let's have a concrete example.

In France, we have the company data publicly available for free online in
digital non machine readable format through a search form. However, It is
not possible to get the whole dataset from that option.

We also have the company data available as bulk in digital machine-readable
format for 65994€ but the data is not online, only available through a
download service.

If I consider the first one:
Data exist
In digital format
Publicly available
Online
For free
Up to date

If I consider the second one:
Data exist
In digital format
Bulk
Machine readable
Up to date

So I should consider and assess only the first one. Right ?

But apparently this is not what most of contributors understood
http://global.census.okfn.org/dataset/companies




On Wed Nov 05 2014 at 3:36:14 PM Rufus Pollock <rufus.pollock at okfn.org>
wrote:

> On 5 November 2014 14:32, Pierre Chrzanowski <pierre.chrzanowski at gmail.com
> > wrote:
>
>> Thanks Rufus,
>>
>> One of the major problem I see with this methodology (lack of chaining)
>> is that it actually allow us to assess different instance of publication
>> for a dataset.
>>
>> For instance, if UK Company Data was also available online but not as
>> bulk. What would have been the answers ?
>>
>
> Obviously, the online by default but I feel this is a pretty edge case.
>
>
>> This lead to very different interpretations and contributions as
>> exemplified by Simon.
>>
>
> What are the exact examples where this has caused a problem (my apologies
> for missing this if already said).
>
>
>> I think we should clarify methodology to help us choose which dataset to
>> assess.
>>
>> For instance, always prefer to assess publicly available data rather non
>> publicly avalaible data, and then online rather non online, etc.
>>
>
> That definitely seems sensible and I thought would be implied but as you
> say spelling that out could definitely be useful.
>
> Rufus
>
>
>
>> Best
>> Pierre
>>
>> On Mon Nov 03 2014 at 5:30:51 PM Rufus Pollock <rufus.pollock at okfn.org>
>> wrote:
>>
>>> On 3 November 2014 16:11, Pierre Chrzanowski <
>>> pierre.chrzanowski at gmail.com> wrote:
>>>
>>>> Sorry to keep going on but I actually thought there were some evident
>>>> chains such as : bulk or format are null if data is not publicly available
>>>> online. Otherwise it means that one has to be able to have access to the
>>>> unavailable data to confirm evidences.
>>>>
>>>
>>> These are excellent points PIerre and we thought quite a bit about the
>>> implication chains last year (and have tried to build some into the survey
>>> logic).
>>>
>>> On the bulk the logic was this: in the UK you used to be able to get the
>>> Companies Register in bulk on CDs but not online. (So this is an example of
>>> bulk being true but online being false).
>>>
>>> Similarly, for format it is again the case that stuff coudl be available
>>> in a specific format but not publicly online.
>>>
>>>
>>>> For instance, I am being told that spending government data in France
>>>> exist in reusable format and in bulk. But I cannot access the data so why
>>>> should I believe this ? Should I go to the Ministry ?
>>>>
>>>
>>> I would say that is definitely a stretch: if data is not available to
>>> anyone then it would be impossible to know if bulk so i would mark this as
>>> no or unsure in this case. Similarly, on reusable. However, if e.g. the
>>> Ministry made the data available to researchers on CD-ROMs you would be
>>> able to answer this even if not publicly available.
>>>
>>> Rufus
>>>
>>>
>>>> Then, there are actually some questions that consider public
>>>> availability implicitly in their definition such as for bulk [1]. Two
>>>> questions are chained then.
>>>>
>>>> I hope that we will be able to sort that out before we publish
>>>> anything. Otherwise, I know there are some people ready to fire :)
>>>>
>>>> Best
>>>>
>>>> [1] Data is available in bulk if the whole dataset can be downloaded or
>>>> accessed easily. Conversely it is considered non-bulk if the citizens are
>>>> limited to just getting parts of the dataset (for example, if restricted to
>>>> querying a web form and retrieving a few results at a time from a very
>>>> large database).
>>>>
>>>>
>>>>
>>>> On Mon Nov 03 2014 at 3:25:59 PM Mor Rubinstein <morchickit at gmail.com>
>>>> wrote:
>>>>
>>>>> HI guys,
>>>>>
>>>>> Again, thanks for writing.
>>>>>
>>>>> The only chain that we mentioned in the tutorial is the follows:
>>>>> If the data is not available, then the system will mark the rest of
>>>>> the questions as 'no'.
>>>>>
>>>>> There is no other chain in the system, and we were expected each
>>>>> parameter to be taken into consideration independently. This is done, among
>>>>> the rest, in order to allow to different stakeholders in the open
>>>>> government sphere to understand what they need to focus on in order to
>>>>> improve they openness.
>>>>>
>>>>> I will update the reviewers guide, the site and the tutorial today in
>>>>> order to unsure that we will consistency and for the documentation for the
>>>>> next Index.
>>>>>
>>>>> Thank you guys for bringing it up, you are making the index better. :-)
>>>>>
>>>>> All the best,
>>>>> Mor
>>>>>
>>>>> On Mon, Nov 3, 2014 at 2:04 PM, Pierre Chrzanowski <
>>>>> pierre.chrzanowski at gmail.com> wrote:
>>>>>
>>>>>> Thanks Graeme,
>>>>>>
>>>>>> I think that Simon was referring to the transnational level criteria
>>>>>> for government spending data.
>>>>>>
>>>>>> @Christian, @Mor would be good to clarify chained / dependent
>>>>>> questions. It is true there is no proper guideline on that.
>>>>>>
>>>>>> All the best
>>>>>> Pierre
>>>>>>
>>>>>>
>>>>>> On Mon Nov 03 2014 at 2:20:18 PM Graeme Jones <jonesiom at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Pierre
>>>>>>>
>>>>>>> 2/ and 4/  I had a specific email exchange with Christian / Mor to
>>>>>>> clarify chained or independent (independent) to ensure consistency ;O)
>>>>>>>
>>>>>>> 3b/  I think experienced people in the #opendata community typically
>>>>>>> side with the lowest common denominator, you are benchmarking to improve so
>>>>>>> hopefully not already perfect or nothing left to do!
>>>>>>>
>>>>>>> 3b/  similarly the issue is often willing volunteers and/or unpaid
>>>>>>> hours.  I might have been able to persuade someone else to independently
>>>>>>> contribute/review Isle of Man submissions but difficult to justify
>>>>>>> unquantified unpaid hours to do the same for other jurisdictions -- last
>>>>>>> time I did submissions for about 16 countries and this time I allocated any
>>>>>>> spare unpaid hours to briefly review Jersey (ran out of time on Guernsey)
>>>>>>> but added some data on other jurisdictions such as UAE, US Virgin Islands,
>>>>>>> etc.
>>>>>>>
>>>>>>> people that know what/how to look are thin on the ground in big
>>>>>>> countries never mind little countries, hence the importance of mentors
>>>>>>> office hours initiatives etc
>>>>>>>
>>>>>>> 3b/  the push towards a localised UK OGL and financereports.gov.im
>>>>>>> were large steps in an offshore country and required *lots* of unpaid hours
>>>>>>> on lobbying, slidedecks, favours such as indirect legal opinion from HM
>>>>>>> Attorney General, frontline staff training on data cleansing, etc.
>>>>>>> sorry, perhaps I have missed something, but the
>>>>>>> financereports.gov.im microsite shows govt spending in a timescale
>>>>>>> at least as good as most of the best countries and better than most other
>>>>>>> countries and under a localised UK OGL -- the OGL in conjunction with
>>>>>>> independent criteria is largely why the Isle of Man is higher in the charts
>>>>>>>
>>>>>>> in fact the end result of a ranking last year was the Isle of Man
>>>>>>> Government requested membership of the Open Government Partnership, surely
>>>>>>> exactly what anyone in the open government movement should aspire to
>>>>>>> achieve?
>>>>>>>
>>>>>>> also scheduled discussions already include a shift to real time
>>>>>>> reporting of the national accounts with data visualisation as a
>>>>>>> minister/voter/taxpayer frontend
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Graeme Jones
>>>>>>>
>>>>>>> On 3 November 2014 12:00, <open-data-census-request at lists.okfn.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Message: 1
>>>>>>>> Date: Mon, 03 Nov 2014 11:20:20 +0000
>>>>>>>> From: Pierre Chrzanowski <pierre.chrzanowski at gmail.com>
>>>>>>>> To: open-data-census <open-data-census at lists.okfn.org>
>>>>>>>> Cc: "okfn-fr-members at lists.okfn.org" <okfn-fr-members at lists.okfn.
>>>>>>>> org>,
>>>>>>>>         "Simon Chignard - data.gouv.fr" <simon at data.gouv.fr>
>>>>>>>> Subject: [Open-data-census] Serious inconsistencies in the
>>>>>>>> application
>>>>>>>>         of      the methodology
>>>>>>>>
>>>>>>>> Hi list, I am forwarding a message from Simon Chignard who is
>>>>>>>> concerned
>>>>>>>> about the lack of quality and consistency in the current
>>>>>>>> submissions.
>>>>>>>>
>>>>>>>> I think his feedbacks should be carefully taken into account for the
>>>>>>>> reviewing process.
>>>>>>>>
>>>>>>>> Best
>>>>>>>> Pierre
>>>>>>>>
>>>>>>>> Ps : text below is a Google translate from email wrote in French to
>>>>>>>> okf
>>>>>>>> france members list
>>>>>>>>
>>>>>>>> ---
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> I spotted this weekend which seems to me to be serious
>>>>>>>> inconsistencies in
>>>>>>>> the application of the methodology of the Open Data Index since
>>>>>>>> 2014. I
>>>>>>>> alert you that the question of the reliability of the tool.
>>>>>>>>
>>>>>>>> 1 / An example: the assessment of open Zipcodes / Postcodes.
>>>>>>>>
>>>>>>>> Consider the postal code file for Spain, Sweden, Canada and France.
>>>>>>>>
>>>>>>>> In these four countries, the situation is the same: a more or less
>>>>>>>> public
>>>>>>>> operator (Correos, Postnummer, Canada Post and La Poste) sells, on
>>>>>>>> demand,
>>>>>>>> the postal code file.
>>>>>>>>
>>>>>>>> Yet, these are the scores on the same file:
>>>>>>>>
>>>>>>>> Zipcode / Canada: 55%
>>>>>>>> http://global.census.okfn.org/entry/ca/postcodes
>>>>>>>>
>>>>>>>> Zipcode / Spain: 45%
>>>>>>>> http://global.census.okfn.org/entry/es/postcodes
>>>>>>>>
>>>>>>>> Zipcode / France: 10%
>>>>>>>> http://global.census.okfn.org/entry/fr/postcodes
>>>>>>>>
>>>>>>>> Zipcode / Sweden: 55%
>>>>>>>> http://global.census.okfn.org/entry/se/postcodes
>>>>>>>>
>>>>>>>>
>>>>>>>> 2 / What is at issue
>>>>>>>>
>>>>>>>> The question posed here is that of chaining or independence
>>>>>>>> criteria.
>>>>>>>>
>>>>>>>> In France we (collectively) have considered that the criteria
>>>>>>>> chained. This
>>>>>>>> means that if the data is not available then we put red all other
>>>>>>>> criteria.
>>>>>>>> However, in all other countries I could see they took each criterion
>>>>>>>> separately. They consider that given legally sold and closed may
>>>>>>>> still be
>>>>>>>> available online, be current, be downloaded in bulk, etc ...
>>>>>>>>
>>>>>>>> I took the example of Zipcodes but there is the same problem for
>>>>>>>> other
>>>>>>>> evaluations, for example here:
>>>>>>>> http://global.census.okfn.org/entry/si/companies
>>>>>>>>
>>>>>>>> 3 / An assessment that differs between countries
>>>>>>>>
>>>>>>>> When we look in detail on the evaluation, we also see that the
>>>>>>>> application
>>>>>>>> of the criteria is more or less strict.
>>>>>>>>
>>>>>>>> An example: Zipcode / Slovania: 55%
>>>>>>>> http://global.census.okfn.org/entry/si/postcodes - the commentary
>>>>>>>> states:
>>>>>>>> Data is available from Post of Slovenia, purpose is hidden in HTML
>>>>>>>> format,
>>>>>>>> not available in bulk and Additional skills are needed to extract
>>>>>>>> it.
>>>>>>>> Geodetska uprava (Slovenian equivalent of UK Ordnance Survey)
>>>>>>>> resells bulk
>>>>>>>> data with GIS Additional information.
>>>>>>>>
>>>>>>>> Just scrap the data then it deserves a score of 55%?
>>>>>>>>
>>>>>>>> One for the road: Finland / Spending: 90%
>>>>>>>> http://global.census.okfn.org/entry/fi/spending - Certain assets
>>>>>>>> data are
>>>>>>>> available on Finnish data portal Avoindata.fi. More information
>>>>>>>> from Netra
>>>>>>>> Will Be ouvert in the future.
>>>>>>>>
>>>>>>>> There was clearly a problem for the application of the methodology
>>>>>>>> described, for evaluating a current and non-availability "in the
>>>>>>>> future."
>>>>>>>>
>>>>>>>> 3 / A reviewer who is also the editor for a country
>>>>>>>>
>>>>>>>> I looked in detail ratings for the Isle of Man, who gets such good
>>>>>>>> scores
>>>>>>>> for Government Spending file (100%).
>>>>>>>> That evaluation and comment: http://global.census.okfn.org/
>>>>>>>> entry/im/spending
>>>>>>>>
>>>>>>>>
>>>>>>>> The proposed link is this one: http://financereports.gov.im - it
>>>>>>>> in no way
>>>>>>>> corresponds to the criteria of the methodology.
>>>>>>>>
>>>>>>>> The problem seems even more serious for this country - and unlike
>>>>>>>> the
>>>>>>>> response Mor was Peter - it is one and the same person who proposed
>>>>>>>> the
>>>>>>>> evaluation and validated once.
>>>>>>>>
>>>>>>>> 4 / Why is that a problem?
>>>>>>>>
>>>>>>>> It was therefore clearly major inconsistencies in how to apply the
>>>>>>>> criteria
>>>>>>>> for each country. But if the goal is to produce a ranking of
>>>>>>>> countries -
>>>>>>>> not to assess individually), it is a problem. And even a serious
>>>>>>>> problem to
>>>>>>>> the extent that 10 places to play close to 10%!
>>>>>>>>
>>>>>>>> The only solution, to me it seems, is that the OKF can ensure that
>>>>>>>> the
>>>>>>>> assessment is consistent for all countries .. if it is the
>>>>>>>> credibility of
>>>>>>>> the ranking is questioned.
>>>>>>>>
>>>>>>>> Simon
>>>>>>>>
>>>>>>>> PS: also the issue had already been raised in 2012 for the
>>>>>>>> classification
>>>>>>>> of W3C https://lists.okfn.org/pipermail/euopendata/2013-
>>>>>>>> February/001153.html
>>>>>>>> - so I do not feel that the only problem is discovered now.
>>>>>>>> -------------- next part --------------
>>>>>>>> An HTML attachment was scrubbed...
>>>>>>>> URL: <http://lists.okfn.org/pipermail/open-data-census/
>>>>>>>> attachments/20141103/99ca3879/attachment-0001.html>
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>> Subject: Digest Footer
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> open-data-census mailing list
>>>>>>>> open-data-census at lists.okfn.org
>>>>>>>> https://lists.okfn.org/mailman/listinfo/open-data-census
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>> End of open-data-census Digest, Vol 18, Issue 3
>>>>>>>> ***********************************************
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> open-data-census mailing list
>>>>>>> open-data-census at lists.okfn.org
>>>>>>> https://lists.okfn.org/mailman/listinfo/open-data-census
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> open-data-census mailing list
>>>>>> open-data-census at lists.okfn.org
>>>>>> https://lists.okfn.org/mailman/listinfo/open-data-census
>>>>>>
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> open-data-census mailing list
>>>> open-data-census at lists.okfn.org
>>>> https://lists.okfn.org/mailman/listinfo/open-data-census
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Rufus PollockFounder and President | skype: rufuspollock |
>>> @rufuspollock <https://twitter.com/rufuspollock>Open Knowledge
>>> <http://okfn.org/> - see how data can change the world**http://okfn.org/
>>> <http://okfn.org/> | @okfn <http://twitter.com/OKFN> | Open Knowledge on
>>> Facebook <https://www.facebook.com/OKFNetwork> |  Blog
>>> <http://blog.okfn.org/>*
>>>
>>> The Open Knowledge Foundation is a not-for-profit organisation.  It is
>>> incorporated in England & Wales as a company limited by guarantee, with
>>> company number 05133759.  VAT Registration № GB 984404989. Registered
>>> office address: Open Knowledge Foundation, St John’s Innovation Centre,
>>> Cowley Road, Cambridge, CB4 0WS, UK.
>>>
>>
>
>
> --
>
> *Rufus PollockFounder and President | skype: rufuspollock | @rufuspollock
> <https://twitter.com/rufuspollock>Open Knowledge <http://okfn.org/> - see
> how data can change the world**http://okfn.org/ <http://okfn.org/> |
> @okfn <http://twitter.com/OKFN> | Open Knowledge on Facebook
> <https://www.facebook.com/OKFNetwork> |  Blog <http://blog.okfn.org/>*
>
> The Open Knowledge Foundation is a not-for-profit organisation.  It is
> incorporated in England & Wales as a company limited by guarantee, with
> company number 05133759.  VAT Registration № GB 984404989. Registered
> office address: Open Knowledge Foundation, St John’s Innovation Centre,
> Cowley Road, Cambridge, CB4 0WS, UK.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-data-census/attachments/20141105/8c5437ee/attachment-0001.html>


More information about the open-data-census mailing list