[open-science] the early-career guide to doing open science?

Stacy Konkiel stacy.konkiel at gmail.com
Fri Mar 16 15:11:51 UTC 2012


Hi Tom,

Have you approached your university's institutional repository? While
many are not as feature rich as we've become accustomed to with
enterprise solutions, many do provide the option of versioning and, in
some cases, will provide large-scale file storage for free (no matter
the file type). They can be a great place to find preservation-quality
data storage, and the librarians who run them are generally more than
happy to work with folks who wish to make their data open.

What field of research are you in? Are you strictly computer science,
or does your research overlap with a subject area that has its own
subject repositor(y/ies) (such as astrophysics)?


Best,
Stacy Konkiel
E-Science Librarian
Indiana University

On Fri, Mar 16, 2012 at 11:02 AM, Tom Roche <Tom_Roche at pobox.com> wrote:
>
> summary: are there guides to, e.g., archiving and enabling access to
> science inputs and outputs? esp for the under-resourced, early-career
> scientist-in-training.
>
> details:
>
> I'm a former software engineer, now a graduate student in atmospheric
> modeler. My products as a computational scientist will continue to be
> "soft" (whether, e.g., code, documents, graphics), and will therefore
> have needs similar to those of open-source software (OSS) projects:
> e.g., version control, backup, public access. (Hence I still consider
> myself very much a software engineer, though my colleagues seem to see
> themselves as scientists who just happen to work with software--but
> that's a separate matter.) As a coder I have worked, and continue to
> work, on several OSS projects, and am fairly familiar with the various
> distributed version-control systems (DVCS, e.g., git) and cloud-based
> platforms for OSS development.
>
> I'd like to learn more about best practices (and, frankly, cheap
> practices :-) for similarly maintaining and (for want of a better
> term) "opening" one's scientific products, whether finished or under
> development). Ideally I'd like to also
>
> * keep one copy of important data on my cluster, and another in a
>  cloud repository
>
> * version important data as it's received and processed
>
> * version analytics (e.g., plots that take more than a minute, or that
>  require significant setup, to produce) as they are updated
>
> similar to the manner in which one uses syncable local and cloud DVCS
> for the code that processes and analyzes that data. I could then
>
> * point colleagues at the cloud repository for collaboration
>
> * reference a branch of my project as supplemental information for
>  publications
>
> * do "automated build" of publications out of the repository, in the
>  manner that installable software is built from sources
>
> * incorporate branched data from others' repositories as needed
>
> I am currently hosting a small part of my current project on free OSS
> sites. But, unlike most straight-code projects, data (whether raw or
> processed) must also be managed, in volume. Unfortunately, the free
> sites of which I'm aware usually
>
> - provide what are, for me at least, small filespaces (scale ~= 1 GB).
>
> - disallow versioning of large files and binaries (e.g., netCDF data)
>
> Given my status, and the state of science funding in the US, free
> repositories are all I can afford for the foreseeable future. I would
> hope that one or more of the institutions with which I am affiliated
> would provide functionality suited to open-science workflows such as
> the above, but that does not seem forthcoming. (They seem much more
> interested in keeping all but officially-approved content "inside the
> firewall," which is partly understandable, but greatly restricts
> collaboration and openness.)
>
> I find the open ethos compelling in both domains--science and
> software--both normatively (esp for more policy-relevant science) and
> positively/pragmatically ("more eyes make shallow bugs"). Hence I'm
> hoping that there is already support for the type of workflows sketched
> above (and to solutions for other problems for open scientists of which
> I am as yet blissfully unaware :-), and that folks out there can pass
> pointers to sites, groups, or docs describing their use.
>
> TIA, Tom Roche <Tom_Roche at pobox.com>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science




More information about the open-science mailing list