[open-science] How should we publish survey/tabular data? A Panton Paper?
wilbanks at creativecommons.org
Mon Jul 18 17:20:58 UTC 2011
More abstract tools that are available include:
- Sumatra, which also stores the code and the outcomes of the analysis.
- Cacher, which provides tools for caching and distributing statistical
And of course thedata.org is the easiest place to put a data set for
long term storage, and getting a SHA-1 name is a bonus.
In terms of how to write a data paper, I encourage you to look at
jonathan rees' short essay on "Recommendations for independent scholarly
publication of data sets" at
http://neurocommons.org/report/data-publication.pdf - includes links to
some journals that have been publishing "data papers" for years.
And remember to document your workflow and present it in VisTrails too.
http://www.vistrails.org/ - if we don't know how the researcher who
collected the data processed it before it got into tabular form, then
it's not very useful...
On 7/18/2011 9:01 AM, Andy Turner wrote:
> Inferring type is sometimes not straightforward, but file names can help... If publishing data though, one can be clear about the MIME type (http://en.wikipedia.org/wiki/Internet_media_type) for each different file download.
> For geospatial table data and MIME types for the main geospatial mark-up language GML there is the following documentation from the Open Geospatial Consortium (OGC:
> Tabular geospatial data for the OGC Table Joining Service (http://www.opengeospatial.org/standards/tjs) was expected as XML.
> For the latest GML Encoding, the MIME type allows for optional parameters for "version" and "charset" (http://portal.opengeospatial.org/files/?artifact_id=37743).
> I expect that something similar is wanted as well as converters for when the data is wanted in another format.
> Sorry, I've rushed this message, but the main point is that the format should be unambiguous regardless of the file name.
> From: open-science-bounces at lists.okfn.org [mailto:open-science-bounces at lists.okfn.org] On Behalf Of Peter Murray-Rust
> Sent: 17 July 2011 09:09
> To: Frey J.G.
> Cc: open-science
> Subject: Re: [open-science] How should we publish survey/tabular data? A Panton Paper?
> Completely agreed. It should be relatively simple to tie up the apparent file suffixes and the content. Of course files called *.dat won't do much for us. But a *.CSV should read into a CSV library. We can probably detect that a *.gif reads into a GIF library.
> On Sat, Jul 16, 2011 at 10:42 PM, Frey J.G.<J.G.Frey at soton.ac.uk<mailto:J.G.Frey at soton.ac.uk>> wrote:
> How about an automated check data (file) format service? Difficult but useful!? Just for the very simple data vs image etc
> Jeremy Frey
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> open-science mailing list
> open-science at lists.okfn.org
VP for Science
More information about the open-science