[open-science] the early-career guide to doing open science?

Angus Whyte a.whyte at ed.ac.uk
Fri Mar 16 16:23:48 UTC 2012


Tom,

Digital Curation Centre provides how-to guides and other resources on 
topics relevant to archiving and enabling access at 
http://www.dcc.ac.uk/resources/how-guides

There might also be resources you would find useful at 
http://software.ac.uk/resources/preserving-software-resources

best wishes,

Angus Whyte

-- 
Dr Angus Whyte
Senior Institutional Support Officer
Digital Curation Centre
University of Edinburgh
Crichton St, Edinburgh EH8 9LE
+44-131-650-9986




On 16/03/2012 15:02, Tom Roche wrote:
> summary: are there guides to, e.g., archiving and enabling access to
> science inputs and outputs? esp for the under-resourced, early-career
> scientist-in-training.
>
> details:
>
> I'm a former software engineer, now a graduate student in atmospheric
> modeler. My products as a computational scientist will continue to be
> "soft" (whether, e.g., code, documents, graphics), and will therefore
> have needs similar to those of open-source software (OSS) projects:
> e.g., version control, backup, public access. (Hence I still consider
> myself very much a software engineer, though my colleagues seem to see
> themselves as scientists who just happen to work with software--but
> that's a separate matter.) As a coder I have worked, and continue to
> work, on several OSS projects, and am fairly familiar with the various
> distributed version-control systems (DVCS, e.g., git) and cloud-based
> platforms for OSS development.
>
> I'd like to learn more about best practices (and, frankly, cheap
> practices :-) for similarly maintaining and (for want of a better
> term) "opening" one's scientific products, whether finished or under
> development). Ideally I'd like to also
>
> * keep one copy of important data on my cluster, and another in a
>    cloud repository
>
> * version important data as it's received and processed
>
> * version analytics (e.g., plots that take more than a minute, or that
>    require significant setup, to produce) as they are updated
>
> similar to the manner in which one uses syncable local and cloud DVCS
> for the code that processes and analyzes that data. I could then
>
> * point colleagues at the cloud repository for collaboration
>
> * reference a branch of my project as supplemental information for
>    publications
>
> * do "automated build" of publications out of the repository, in the
>    manner that installable software is built from sources
>
> * incorporate branched data from others' repositories as needed
>
> I am currently hosting a small part of my current project on free OSS
> sites. But, unlike most straight-code projects, data (whether raw or
> processed) must also be managed, in volume. Unfortunately, the free
> sites of which I'm aware usually
>
> - provide what are, for me at least, small filespaces (scale ~= 1 GB).
>
> - disallow versioning of large files and binaries (e.g., netCDF data)
>
> Given my status, and the state of science funding in the US, free
> repositories are all I can afford for the foreseeable future. I would
> hope that one or more of the institutions with which I am affiliated
> would provide functionality suited to open-science workflows such as
> the above, but that does not seem forthcoming. (They seem much more
> interested in keeping all but officially-approved content "inside the
> firewall," which is partly understandable, but greatly restricts
> collaboration and openness.)
>
> I find the open ethos compelling in both domains--science and
> software--both normatively (esp for more policy-relevant science) and
> positively/pragmatically ("more eyes make shallow bugs"). Hence I'm
> hoping that there is already support for the type of workflows sketched
> above (and to solutions for other problems for open scientists of which
> I am as yet blissfully unaware :-), and that folks out there can pass
> pointers to sites, groups, or docs describing their use.
>
> TIA, Tom Roche<Tom_Roche at pobox.com>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>



*** New Book: Graham Pryor (Ed.) 'Managing Research Data' Facet Publishing 2011
http://www.facetpublishing.co.uk/title.php?id=7562&category_code=810


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.





More information about the open-science mailing list