[@OKau] "High value" datasets

Mon Apr 20 01:47:24 UTC 2015

This is also a good read... recently prepared and lots of references:
https://www.law.berkeley.edu/files/Final_OpenDataLitReview_2015-04-14_1.1.pdf

*STEVEN DE COSTA *|
*EXECUTIVE DIRECTOR*www.linkdigital.com.au

On 20 April 2015 at 09:47, Cassie Findlay <findlay.cassie at gmail.com> wrote:

> Belatedly, thanks again everyone. Here's a quick and dirty summary of what
> I've learned:
>
>    - feedback on the quality of published datasets should be routinely
>    gathered and fed back to creators and decision makers
>    - downloads and page hits should be monitored for popularity of
>    published datasets as well as of standard website pages and FOI requests -
>    all of which helps build a picture of public interest trends
>    - the open data index offers a valuable a bird's eye view of where we
>    are at in Australia and globally and where there are gaps and more progress
>    needs to be made
>    - criteria for 'value' can vary depending on the discipline (see RSDI
>    example) but having the dataset (more) publicly accessible is a key measure
>    for determining how worth supporting its maintenance in $ terms
>    - the adoption of approaches like the adoption of standards and tools
>    for a frictionless data ecosystem (see OKFN Frictionless data) will in turn
>    increase the value of data in the community
>
> Some neat matrices / sets of criteria include:
>
> *City of Philadelphia: *
>
>    - *Publication Quality* - The team found that whether a dataset was
>    “published” is more complicated than “true or false,” and thus recorded
>    information about what formats were available, how up-to-date they were,
>    how well documented they were, etc., and used that information to inform a
>    publication quality score.
>    - *Other Cities* - To get a sense of what high demand datasets were
>    being released elsewhere and help inform departments of existing
>    precedents, the team researched the data portals of four other major U.S.
>    cities - Baltimore, Boston, Chicago, and New York City. Popular datasets
>    not yet published by the City of Philadelphia were recorded as
>    “unpublished” datasets.
>     - *Demand / Impact* - The team used information derived from an
>    analysis of over 2,800 Right to Know requests, voting on the Open Data
>    Pipeline Trello board, and nominations on OpenDataPhilly.org to estimate
>    demand for each dataset using a scale of 1-5 (5 being greatest)
>     - *Cost / Complexity* - Information about the level of technical
>    effort required to prepare each dataset for publishing was used to produce
>    an estimate of the cost/complexity on a scale of 1-5 (5 being greatest)
>
> *Steve Bennett:*
>
> "three criteria when pondering priorities for government data release:
> 1. Uniqueness: to what extent are there no other sources of this
> information? A council's collection of street
> information is valuable but there's a lot of overlap with OpenStreetMap,
> for instance. But no one else could have
> the garbage collection zone boundaries.
> 2. Maintenance. Datasets age pretty quickly, and a dataset that's more
> than a year out of date seems to go
> downhill in value pretty fast.
> 3. Reusability: was the data being collected with a general purpose in
> mind, or are there limitations due to the
> original purpose for which it was collected (eg, lack of
> comprehensiveness, idiosyncratic groupings, jurisdictional
> filtering...)"
>
> *EU *
>
> (value as seen from a publisher's perspective):
>
> "a dataset may be considered of high - value when one or more of the
> following criteria are met
>
> It contributes to transparency:
> These  datasets  are  published because  they increase  the  transparency
> and openness   of   the   government   towards   its   citizens. For
> instance   the publication  of  parliaments’ data, such  as election
> results, or the  way go vernmental  budgets  are  spent,  or staff  cost
> of  public  administrations all contribute  to  the  transparency  of  the
> w
> ay  public  administrations  are working.
>
> Its publication is subject to a legal obligation:
> In some cases the publication of data is enforced by law.
> The PSI Directive for  instance, regulates  the  publication  of  policy -
> related  documents  by (semi) public organisations.
>
> It directly or indirectly relates to their public task:
> A public administration may publish a dataset because it directly relates
> to its  public  task.  For  instance  DG  CLIMA  may  publish  statistics
> on  CO2-emission as part of its task for raising awareness about climate
> change.
>
> It realises a cost reduction:
> The availability and re-use of a dataset, e.g. contact information, code
> lists, reference   data   and   controlled vocabularies, eliminates the
> need for duplication of data and effort, reduces costs and increases
> interoperability.
> Collections  of  data  housed  in  the  base  registers  and  geospatial
> data  are prime  examples  of  dataset  which  opening  up  will  lead  to
> direct  cost reductions in data management, production and exchange.
>
> The type and size of its target audience:
> A dataset may be useful for/relevant to a large audience (size-based
> value), for instance traffic data.
> On the other hand a dataset may bring large value to  a  specific  target
> audience  (target/subject-based value),  for  instance  a dataset
> containing  data  of  particles  colliding  at  high  speed  in  a
> particle accelerator.
>
>
> Cheers
> Cassie
>
>
>
>
> On Wed, Apr 15, 2015 at 5:41 PM, Steve Bennett <stevage at gmail.com> wrote:
>
>> On Tue, Apr 14, 2015 at 4:30 PM, Cassie Findlay <findlay.cassie at gmail.com
>> > wrote:
>>
>>> Has anyone come across good criteria or defined methods for identifying
>>> 'high value' datasets? If, for example, you are looking at a whole of
>>> government jurisdiction. I found some in this EU report
>>> <http://ec.europa.eu/isa/documents/publications/report-on-high-value-datasets-from-eu-institutions_en.pdf>
>>> but would like to gather some more.
>>>
>>> I realise that value is a highly subjective thing to assert (valuable
>>> for whom, why?) and really like Rosie's work on defining the problems
>>> first, in order to then work out where you might find datasets of value,
>>> but all that aside :) - are there examples out there of work to define high
>>> value stuff?
>>>
>>
>> Other people have commented on other frameworks etc for assessing value,
>> but informally, I've found myself focusing on three criteria when pondering
>> priorities for government data release:
>>
>> 1. Uniqueness: to what extent are there no other sources of this
>> information? A council's collection of street information is valuable but
>> there's a lot of overlap with OpenStreetMap, for instance. But no one else
>> could have the garbage collection zone boundaries.
>> 2. Maintenance. Datasets age pretty quickly, and a dataset that's more
>> than a year out of date seems to go downhill in value pretty fast.
>> 3. Reusability: was the data being collected with a general purpose in
>> mind, or are there limitations due to the original purpose for which it was
>> collected (eg, lack of comprehensiveness, idiosyncratic groupings,
>> jurisdictional filtering...)
>>
>> Steve
>>
>> _______________________________________________
>> okfn-au mailing list
>> okfn-au at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/okfn-au
>> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-au
>>
>>
>
> _______________________________________________
> okfn-au mailing list
> okfn-au at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-au
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-au
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-au/attachments/20150420/f72d2bb5/attachment-0004.html>