[@OKau] "High value" datasets

Cassie Findlay findlay.cassie at gmail.com
Sun Apr 19 23:47:51 UTC 2015

Belatedly, thanks again everyone. Here's a quick and dirty summary of what
I've learned:

   - feedback on the quality of published datasets should be routinely
   gathered and fed back to creators and decision makers
   - downloads and page hits should be monitored for popularity of
   published datasets as well as of standard website pages and FOI requests -
   all of which helps build a picture of public interest trends
   - the open data index offers a valuable a bird's eye view of where we
   are at in Australia and globally and where there are gaps and more progress
   needs to be made
   - criteria for 'value' can vary depending on the discipline (see RSDI
   example) but having the dataset (more) publicly accessible is a key measure
   for determining how worth supporting its maintenance in $ terms
   - the adoption of approaches like the adoption of standards and tools
   for a frictionless data ecosystem (see OKFN Frictionless data) will in turn
   increase the value of data in the community

Some neat matrices / sets of criteria include:

*City of Philadelphia: *

   - *Publication Quality* - The team found that whether a dataset was
   “published” is more complicated than “true or false,” and thus recorded
   information about what formats were available, how up-to-date they were,
   how well documented they were, etc., and used that information to inform a
   publication quality score.
   - *Other Cities* - To get a sense of what high demand datasets were
   being released elsewhere and help inform departments of existing
   precedents, the team researched the data portals of four other major U.S.
   cities - Baltimore, Boston, Chicago, and New York City. Popular datasets
   not yet published by the City of Philadelphia were recorded as
   “unpublished” datasets.
    - *Demand / Impact* - The team used information derived from an
   analysis of over 2,800 Right to Know requests, voting on the Open Data
   Pipeline Trello board, and nominations on OpenDataPhilly.org to estimate
   demand for each dataset using a scale of 1-5 (5 being greatest)
    - *Cost / Complexity* - Information about the level of technical effort
   required to prepare each dataset for publishing was used to produce an
   estimate of the cost/complexity on a scale of 1-5 (5 being greatest)

*Steve Bennett:*

"three criteria when pondering priorities for government data release:
1. Uniqueness: to what extent are there no other sources of this
information? A council's collection of street
information is valuable but there's a lot of overlap with OpenStreetMap,
for instance. But no one else could have
the garbage collection zone boundaries.
2. Maintenance. Datasets age pretty quickly, and a dataset that's more than
a year out of date seems to go
downhill in value pretty fast.
3. Reusability: was the data being collected with a general purpose in
mind, or are there limitations due to the
original purpose for which it was collected (eg, lack of comprehensiveness,
idiosyncratic groupings, jurisdictional

*EU *

(value as seen from a publisher's perspective):

"a dataset may be considered of high - value when one or more of the
following criteria are met

It contributes to transparency:
These  datasets  are  published because  they increase  the  transparency
and openness   of   the   government   towards   its   citizens. For
instance   the publication  of  parliaments’ data, such  as election
results, or the  way go vernmental  budgets  are  spent,  or staff  cost
of  public  administrations all contribute  to  the  transparency  of  the
ay  public  administrations  are working.

Its publication is subject to a legal obligation:
In some cases the publication of data is enforced by law.
The PSI Directive for  instance, regulates  the  publication  of  policy -
related  documents  by (semi) public organisations.

It directly or indirectly relates to their public task:
A public administration may publish a dataset because it directly relates
to its  public  task.  For  instance  DG  CLIMA  may  publish  statistics
on  CO2-emission as part of its task for raising awareness about climate

It realises a cost reduction:
The availability and re-use of a dataset, e.g. contact information, code
lists, reference   data   and   controlled vocabularies, eliminates the
need for duplication of data and effort, reduces costs and increases
Collections  of  data  housed  in  the  base  registers  and  geospatial
data  are prime  examples  of  dataset  which  opening  up  will  lead  to
direct  cost reductions in data management, production and exchange.

The type and size of its target audience:
A dataset may be useful for/relevant to a large audience (size-based
value), for instance traffic data.
On the other hand a dataset may bring large value to  a  specific  target
audience  (target/subject-based value),  for  instance  a dataset
containing  data  of  particles  colliding  at  high  speed  in  a
particle accelerator.


On Wed, Apr 15, 2015 at 5:41 PM, Steve Bennett <stevage at gmail.com> wrote:

> On Tue, Apr 14, 2015 at 4:30 PM, Cassie Findlay <findlay.cassie at gmail.com>
> wrote:
>> Has anyone come across good criteria or defined methods for identifying
>> 'high value' datasets? If, for example, you are looking at a whole of
>> government jurisdiction. I found some in this EU report
>> <http://ec.europa.eu/isa/documents/publications/report-on-high-value-datasets-from-eu-institutions_en.pdf>
>> but would like to gather some more.
>> I realise that value is a highly subjective thing to assert (valuable for
>> whom, why?) and really like Rosie's work on defining the problems first, in
>> order to then work out where you might find datasets of value, but all that
>> aside :) - are there examples out there of work to define high value stuff?
> Other people have commented on other frameworks etc for assessing value,
> but informally, I've found myself focusing on three criteria when pondering
> priorities for government data release:
> 1. Uniqueness: to what extent are there no other sources of this
> information? A council's collection of street information is valuable but
> there's a lot of overlap with OpenStreetMap, for instance. But no one else
> could have the garbage collection zone boundaries.
> 2. Maintenance. Datasets age pretty quickly, and a dataset that's more
> than a year out of date seems to go downhill in value pretty fast.
> 3. Reusability: was the data being collected with a general purpose in
> mind, or are there limitations due to the original purpose for which it was
> collected (eg, lack of comprehensiveness, idiosyncratic groupings,
> jurisdictional filtering...)
> Steve
> _______________________________________________
> okfn-au mailing list
> okfn-au at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-au
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-au
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-au/attachments/20150420/7a3d4f91/attachment-0004.html>

More information about the okfn-au mailing list