[okfn-discuss] Building the (Open) Data Ecosystem

Wed Apr 6 13:12:45 UTC 2011

There was an article I found via twitter that seems to be very pertinent
to this discussion. I'll include the paragraphs that summed up the
current data situation and drives:

"When datasets were sparse and only connected to the lab that produced
them, we would brood every one of them, protect (patent) them and work
on them in isolation in order to 'sell' them as chickens, usually in the
form of a largely narrative article. Other scientists need to combine a
minimum of two existing publications to generate new eggs and breed more
chickens. However, chickens have become overabundant: more than 20
million articles exist in biomedicine alone. More recently, valuable
aggregations of data were brought online (for example, data sets in GEO,
curated databases such as SwissProt and locus-specific human gene
variation databases (locus-specific databases such as the Leiden Open
Variation Database LOVD). Now, data (eggs) have become a direct source
of new in silico discoveries and a unit of scientific trade.

But the scientific market has no way to value eggs because the entire
system is built upon judging and exchanging chickens for acknowledgement
and credit (through citations and other measures of impact). On the
other hand, for effective and evidence-based breeding, we need the eggs
as well as information from the parent chickens to assess the value of
the eggs. This is where a major challenge lies: in the long overdue
adaptation in scholarly communication. The data-intensive science wave
that has come over us calls for innovative ways of data sharing,
stewardship and valuation. We must respect the connection between the
articles and the data and value both appropriately."

[Full article at:
http://www.nature.com/ng/journal/v43/n4/full/ng0411-281.html
(Nature Genetics 43, 281–283 (2011) doi:10.1038/ng0411-281)

"The value of data"

Abstract - "Data citation and the derivation of semantic constructs
directly from datasets have now both found their place in scientific
communication. The social challenge facing us is to maintain the value
of traditional narrative publications and their relationship to the
datasets they report upon while at the same time developing appropriate
metrics for citation of data and data constructs."]

On Fri, 2011-04-01 at 18:19 +0100, Peter Murray-Rust wrote:
> 
> 
> On Fri, Apr 1, 2011 at 4:10 PM, <koltzenburg at w4w.net> wrote:
>         hi Rufus, 
>         
>         
>         
>  
>         On Fri, 1 Apr 2011 14:45:18 +0100, Rufus Pollock wrote 
>         > Hi All, 
>         > 
>         
> ... 
> 
>         > 
>         > ## The Present: A One-Way Street 
>         > 
>         
>         > At the current time, the basic model for data processing is
>         a [UTF-8?]â€œone way 
>         > [UTF-8?]streetâ€ . Sources of data, such as government,
>         publish data out into 
>         > the world, where, (if we are lucky) it is processed by
>         intermediaries 
>         > such as app creators or analysts, before finally being
>         consumed by end 
>         > users1. 
>         > 
>         > It is a one way street because there is no feedback loop, no
>         sharing 
>         > of data back to publishers and no sharing between
>         intermediaries. 
>         
> 
> Agreed. I have been working out these ideas at the Am. Chem. Soc. I
> came up with the term "asymmetric" - and this is well argued in
> Becky's chilling analysis . So Open Data is not just the crumbs that
> the peaseant consume under the table. 
> 
> I also addressed the ecosystem aspect and here I use terms like
> "bottom-up" and "web-democratic" . For me this describes Wikipedia,
> OKF, OSM and my own seeds in the ecosystem BlueObelisk
> http://en.wikipedia.org/wiki/Blue_Obelisk and Quixote
> http://quixote.wikispot.org/Front_Page. They have been designed (or
> have evolved) to have no centre and no hierarchy - they work by
> "rought consensus and running code".
> 
> Of course the ACS just continues to go ahead and copyright data
> (deliberately)...
> 
> 
>         > 
>         > So what should be different? 
>         > 
>         > ## The Future: An Ecosystem 
>         > > 
>         > With the introduction of data cycles we have a real
>         ecosystem not a 
>         > one way street and this ecosystem thrives on collaboration, 
>         > componentization and open data. 
>         > 
>         
> 
> This is exactly how Blue Obelisk and Quixote work.
> 
> The power of the ecosystem is that it can find vary dilute resources
> and concentaret them (to use chemical terms). If there are (say)
> 100,000 chemists and 0.1% care about doing something for Openness then
> that's 100 activists and that is enough.
> 
> P.
> 
> 
> 
> -- 
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
> _______________________________________________
> okfn-discuss mailing list
> okfn-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-discuss