[@OKau] "High value" datasets
Cassie Findlay
findlay.cassie at gmail.com
Sun Apr 19 23:47:51 UTC 2015
Belatedly, thanks again everyone. Here's a quick and dirty summary of what
I've learned:
- feedback on the quality of published datasets should be routinely
gathered and fed back to creators and decision makers
- downloads and page hits should be monitored for popularity of
published datasets as well as of standard website pages and FOI requests -
all of which helps build a picture of public interest trends
- the open data index offers a valuable a bird's eye view of where we
are at in Australia and globally and where there are gaps and more progress
needs to be made
- criteria for 'value' can vary depending on the discipline (see RSDI
example) but having the dataset (more) publicly accessible is a key measure
for determining how worth supporting its maintenance in $ terms
- the adoption of approaches like the adoption of standards and tools
for a frictionless data ecosystem (see OKFN Frictionless data) will in turn
increase the value of data in the community
Some neat matrices / sets of criteria include:
*City of Philadelphia: *
- *Publication Quality* - The team found that whether a dataset was
“published” is more complicated than “true or false,” and thus recorded
information about what formats were available, how up-to-date they were,
how well documented they were, etc., and used that information to inform a
publication quality score.
- *Other Cities* - To get a sense of what high demand datasets were
being released elsewhere and help inform departments of existing
precedents, the team researched the data portals of four other major U.S.
cities - Baltimore, Boston, Chicago, and New York City. Popular datasets
not yet published by the City of Philadelphia were recorded as
“unpublished” datasets.
- *Demand / Impact* - The team used information derived from an
analysis of over 2,800 Right to Know requests, voting on the Open Data
Pipeline Trello board, and nominations on OpenDataPhilly.org to estimate
demand for each dataset using a scale of 1-5 (5 being greatest)
- *Cost / Complexity* - Information about the level of technical effort
required to prepare each dataset for publishing was used to produce an
estimate of the cost/complexity on a scale of 1-5 (5 being greatest)
*Steve Bennett:*
"three criteria when pondering priorities for government data release:
1. Uniqueness: to what extent are there no other sources of this
information? A council's collection of street
information is valuable but there's a lot of overlap with OpenStreetMap,
for instance. But no one else could have
the garbage collection zone boundaries.
2. Maintenance. Datasets age pretty quickly, and a dataset that's more than
a year out of date seems to go
downhill in value pretty fast.
3. Reusability: was the data being collected with a general purpose in
mind, or are there limitations due to the
original purpose for which it was collected (eg, lack of comprehensiveness,
idiosyncratic groupings, jurisdictional
filtering...)"
*EU *
(value as seen from a publisher's perspective):
"a dataset may be considered of high - value when one or more of the
following criteria are met
It contributes to transparency:
These datasets are published because they increase the transparency
and openness of the government towards its citizens. For
instance the publication of parliaments’ data, such as election
results, or the way go vernmental budgets are spent, or staff cost
of public administrations all contribute to the transparency of the
w
ay public administrations are working.
Its publication is subject to a legal obligation:
In some cases the publication of data is enforced by law.
The PSI Directive for instance, regulates the publication of policy -
related documents by (semi) public organisations.
It directly or indirectly relates to their public task:
A public administration may publish a dataset because it directly relates
to its public task. For instance DG CLIMA may publish statistics
on CO2-emission as part of its task for raising awareness about climate
change.
It realises a cost reduction:
The availability and re-use of a dataset, e.g. contact information, code
lists, reference data and controlled vocabularies, eliminates the
need for duplication of data and effort, reduces costs and increases
interoperability.
Collections of data housed in the base registers and geospatial
data are prime examples of dataset which opening up will lead to
direct cost reductions in data management, production and exchange.
The type and size of its target audience:
A dataset may be useful for/relevant to a large audience (size-based
value), for instance traffic data.
On the other hand a dataset may bring large value to a specific target
audience (target/subject-based value), for instance a dataset
containing data of particles colliding at high speed in a
particle accelerator.
Cheers
Cassie
On Wed, Apr 15, 2015 at 5:41 PM, Steve Bennett <stevage at gmail.com> wrote:
> On Tue, Apr 14, 2015 at 4:30 PM, Cassie Findlay <findlay.cassie at gmail.com>
> wrote:
>
>> Has anyone come across good criteria or defined methods for identifying
>> 'high value' datasets? If, for example, you are looking at a whole of
>> government jurisdiction. I found some in this EU report
>> <http://ec.europa.eu/isa/documents/publications/report-on-high-value-datasets-from-eu-institutions_en.pdf>
>> but would like to gather some more.
>>
>> I realise that value is a highly subjective thing to assert (valuable for
>> whom, why?) and really like Rosie's work on defining the problems first, in
>> order to then work out where you might find datasets of value, but all that
>> aside :) - are there examples out there of work to define high value stuff?
>>
>
> Other people have commented on other frameworks etc for assessing value,
> but informally, I've found myself focusing on three criteria when pondering
> priorities for government data release:
>
> 1. Uniqueness: to what extent are there no other sources of this
> information? A council's collection of street information is valuable but
> there's a lot of overlap with OpenStreetMap, for instance. But no one else
> could have the garbage collection zone boundaries.
> 2. Maintenance. Datasets age pretty quickly, and a dataset that's more
> than a year out of date seems to go downhill in value pretty fast.
> 3. Reusability: was the data being collected with a general purpose in
> mind, or are there limitations due to the original purpose for which it was
> collected (eg, lack of comprehensiveness, idiosyncratic groupings,
> jurisdictional filtering...)
>
> Steve
>
> _______________________________________________
> okfn-au mailing list
> okfn-au at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-au
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-au
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-au/attachments/20150420/7a3d4f91/attachment-0004.html>
More information about the okfn-au
mailing list