[@OKau] "High value" datasets
Steven De Costa
steven.decosta at linkdigital.com.au
Mon Apr 20 01:47:24 UTC 2015
This is also a good read... recently prepared and lots of references:
https://www.law.berkeley.edu/files/Final_OpenDataLitReview_2015-04-14_1.1.pdf
*STEVEN DE COSTA *|
*EXECUTIVE DIRECTOR*www.linkdigital.com.au
On 20 April 2015 at 09:47, Cassie Findlay <findlay.cassie at gmail.com> wrote:
> Belatedly, thanks again everyone. Here's a quick and dirty summary of what
> I've learned:
>
> - feedback on the quality of published datasets should be routinely
> gathered and fed back to creators and decision makers
> - downloads and page hits should be monitored for popularity of
> published datasets as well as of standard website pages and FOI requests -
> all of which helps build a picture of public interest trends
> - the open data index offers a valuable a bird's eye view of where we
> are at in Australia and globally and where there are gaps and more progress
> needs to be made
> - criteria for 'value' can vary depending on the discipline (see RSDI
> example) but having the dataset (more) publicly accessible is a key measure
> for determining how worth supporting its maintenance in $ terms
> - the adoption of approaches like the adoption of standards and tools
> for a frictionless data ecosystem (see OKFN Frictionless data) will in turn
> increase the value of data in the community
>
> Some neat matrices / sets of criteria include:
>
> *City of Philadelphia: *
>
> - *Publication Quality* - The team found that whether a dataset was
> “published” is more complicated than “true or false,” and thus recorded
> information about what formats were available, how up-to-date they were,
> how well documented they were, etc., and used that information to inform a
> publication quality score.
> - *Other Cities* - To get a sense of what high demand datasets were
> being released elsewhere and help inform departments of existing
> precedents, the team researched the data portals of four other major U.S.
> cities - Baltimore, Boston, Chicago, and New York City. Popular datasets
> not yet published by the City of Philadelphia were recorded as
> “unpublished” datasets.
> - *Demand / Impact* - The team used information derived from an
> analysis of over 2,800 Right to Know requests, voting on the Open Data
> Pipeline Trello board, and nominations on OpenDataPhilly.org to estimate
> demand for each dataset using a scale of 1-5 (5 being greatest)
> - *Cost / Complexity* - Information about the level of technical
> effort required to prepare each dataset for publishing was used to produce
> an estimate of the cost/complexity on a scale of 1-5 (5 being greatest)
>
> *Steve Bennett:*
>
> "three criteria when pondering priorities for government data release:
> 1. Uniqueness: to what extent are there no other sources of this
> information? A council's collection of street
> information is valuable but there's a lot of overlap with OpenStreetMap,
> for instance. But no one else could have
> the garbage collection zone boundaries.
> 2. Maintenance. Datasets age pretty quickly, and a dataset that's more
> than a year out of date seems to go
> downhill in value pretty fast.
> 3. Reusability: was the data being collected with a general purpose in
> mind, or are there limitations due to the
> original purpose for which it was collected (eg, lack of
> comprehensiveness, idiosyncratic groupings, jurisdictional
> filtering...)"
>
> *EU *
>
> (value as seen from a publisher's perspective):
>
> "a dataset may be considered of high - value when one or more of the
> following criteria are met
>
> It contributes to transparency:
> These datasets are published because they increase the transparency
> and openness of the government towards its citizens. For
> instance the publication of parliaments’ data, such as election
> results, or the way go vernmental budgets are spent, or staff cost
> of public administrations all contribute to the transparency of the
> w
> ay public administrations are working.
>
> Its publication is subject to a legal obligation:
> In some cases the publication of data is enforced by law.
> The PSI Directive for instance, regulates the publication of policy -
> related documents by (semi) public organisations.
>
> It directly or indirectly relates to their public task:
> A public administration may publish a dataset because it directly relates
> to its public task. For instance DG CLIMA may publish statistics
> on CO2-emission as part of its task for raising awareness about climate
> change.
>
> It realises a cost reduction:
> The availability and re-use of a dataset, e.g. contact information, code
> lists, reference data and controlled vocabularies, eliminates the
> need for duplication of data and effort, reduces costs and increases
> interoperability.
> Collections of data housed in the base registers and geospatial
> data are prime examples of dataset which opening up will lead to
> direct cost reductions in data management, production and exchange.
>
> The type and size of its target audience:
> A dataset may be useful for/relevant to a large audience (size-based
> value), for instance traffic data.
> On the other hand a dataset may bring large value to a specific target
> audience (target/subject-based value), for instance a dataset
> containing data of particles colliding at high speed in a
> particle accelerator.
>
>
> Cheers
> Cassie
>
>
>
>
> On Wed, Apr 15, 2015 at 5:41 PM, Steve Bennett <stevage at gmail.com> wrote:
>
>> On Tue, Apr 14, 2015 at 4:30 PM, Cassie Findlay <findlay.cassie at gmail.com
>> > wrote:
>>
>>> Has anyone come across good criteria or defined methods for identifying
>>> 'high value' datasets? If, for example, you are looking at a whole of
>>> government jurisdiction. I found some in this EU report
>>> <http://ec.europa.eu/isa/documents/publications/report-on-high-value-datasets-from-eu-institutions_en.pdf>
>>> but would like to gather some more.
>>>
>>> I realise that value is a highly subjective thing to assert (valuable
>>> for whom, why?) and really like Rosie's work on defining the problems
>>> first, in order to then work out where you might find datasets of value,
>>> but all that aside :) - are there examples out there of work to define high
>>> value stuff?
>>>
>>
>> Other people have commented on other frameworks etc for assessing value,
>> but informally, I've found myself focusing on three criteria when pondering
>> priorities for government data release:
>>
>> 1. Uniqueness: to what extent are there no other sources of this
>> information? A council's collection of street information is valuable but
>> there's a lot of overlap with OpenStreetMap, for instance. But no one else
>> could have the garbage collection zone boundaries.
>> 2. Maintenance. Datasets age pretty quickly, and a dataset that's more
>> than a year out of date seems to go downhill in value pretty fast.
>> 3. Reusability: was the data being collected with a general purpose in
>> mind, or are there limitations due to the original purpose for which it was
>> collected (eg, lack of comprehensiveness, idiosyncratic groupings,
>> jurisdictional filtering...)
>>
>> Steve
>>
>> _______________________________________________
>> okfn-au mailing list
>> okfn-au at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/okfn-au
>> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-au
>>
>>
>
> _______________________________________________
> okfn-au mailing list
> okfn-au at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-au
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-au
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-au/attachments/20150420/f72d2bb5/attachment-0004.html>
More information about the okfn-au
mailing list