[open-science] How should we publish survey/tabular data? A Panton Paper?

Mon Jul 18 17:20:58 UTC 2011

More abstract tools that are available include:

- Sumatra, which also stores the code and the outcomes of the analysis.
http://packages.python.org/Sumatra/introduction.html
- Cacher, which provides tools for caching and distributing statistical 
analyses http://cran.r-project.org/web/packages/cacher/vignettes/cacher.pdf

And of course thedata.org is the easiest place to put a data set for 
long term storage, and getting a SHA-1 name is a bonus.

In terms of how to write a data paper, I encourage you to look at 
jonathan rees' short essay on "Recommendations for independent scholarly 
publication of data sets" at 
http://neurocommons.org/report/data-publication.pdf - includes links to 
some journals that have been publishing "data papers" for years.

And remember to document your workflow and present it in VisTrails too. 
http://www.vistrails.org/ - if we don't know how the researcher who 
collected the data processed it before it got into tabular form, then 
it's not very useful...

jtw

On 7/18/2011 9:01 AM, Andy Turner wrote:
> Inferring type is sometimes not straightforward, but file names can help... If publishing data though, one can be clear about the MIME type (http://en.wikipedia.org/wiki/Internet_media_type) for each different file download.
>
> For geospatial table data and MIME types for the main geospatial mark-up language GML there is the following documentation from the Open Geospatial Consortium (OGC:
> Tabular geospatial data for the OGC Table Joining Service (http://www.opengeospatial.org/standards/tjs) was expected as XML.
> For the latest GML Encoding, the MIME type allows for optional parameters for "version" and "charset" (http://portal.opengeospatial.org/files/?artifact_id=37743).
>
> I expect that something similar is wanted as well as converters for when the data is wanted in another format.
>
> Sorry, I've rushed this message, but the main point is that the format should be unambiguous regardless of the file name.
>
> Andy
> http://www.geog.leeds.ac.uk/people/a.turner/
>
> From: open-science-bounces at lists.okfn.org [mailto:open-science-bounces at lists.okfn.org] On Behalf Of Peter Murray-Rust
> Sent: 17 July 2011 09:09
> To: Frey J.G.
> Cc: open-science
> Subject: Re: [open-science] How should we publish survey/tabular data? A Panton Paper?
>
> Completely agreed. It should be relatively simple to tie up the apparent file suffixes and the content. Of course files called *.dat won't do much for us. But a *.CSV should read into a CSV library. We can probably detect that a *.gif reads into a GIF library.
> On Sat, Jul 16, 2011 at 10:42 PM, Frey J.G.<J.G.Frey at soton.ac.uk<mailto:J.G.Frey at soton.ac.uk>>  wrote:
> Peter
> How about an automated check data (file) format service? Difficult but useful!? Just for the very simple data vs image etc
> Jeremy Frey
>
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>
>
>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science

-- 
John Wilbanks
VP for Science
Creative Commons
web: http://creativecommons.org/science
blog: http://scienceblogs.com/commonknowledge
twitter: @wilbanks