[open-science] github/R stack for the nomadic researcher

William Gunn william.gunn at gmail.com
Wed Apr 11 23:28:00 UTC 2012


To further muddy the waters, I note that BMC has made a
deal<http://blogs.openaccesscentral.com/blogs/bmcblog/entry/labarchives_and_biomed_central_a>with
this outfit called LabArchives to host datasets (limited to 100MB per
author, not publication) for BMC papers. One benefit of focusing the
resources on a small number of players would be to reduce this sort of
confusion.


William Gunn
+1 646 755 9862
http://synthesis.williamgunn.org/about/




On Wed, Apr 11, 2012 at 11:40 AM, Carl Boettiger <cboettig at gmail.com> wrote:

> To carry this thread a step further...
>
> I suspect most researchers will be apt to use something if they can be
> convinced that it's already accepted for use by the scientific community --
> whether or not it's specifically designed for scientists.  I'd agree with
> Jessy -- the internet is littered with projects that have tried to be a
> "facebook for scientists" a "github for scientists" etc, which are largely
> unsuccessful when the impression is that "no researcher I know & respect
> uses that". Meanwhile, I've seen many skeptics go from "know
> self-respecting researcher would use twitter" to "look at all these
> well-respected researchers using twitter!  I better get on the boat"
>
> I believe that having funders and journals visibly support, recommend, or
> require the use of a particular repository is also essential.  This has
> certainly helped repositories such as Dryad and figshare.  Meanwhile,
> alternate solutions will have to distinguish themselves from other players
> in this field, and the different repositories need to interact seemlessly
> so researchers don't have to worry about which one to submit to.  The
> DataONE project is probably a leading example of these efforts.
>
> So, perhaps we don't need a thedatahub.org for science, perhaps we do.
> What we really need, though, is established researchers using the service
> and funders and publishers recommending or requiring it.
>
> The Amazon S3 business mentioned on another recent thread is probably an
> important as well, at least for very large datasets.  With examples like
> Titus's, or more visibly, NIH's recent announcement of freely hosting the 1000
> Genomes dataset <http://aws.amazon.com/1000genomes/> there (200
> terabytes), this seems like an important player.  Perhaps thedatahub.orgcould simply provide the option for data to post to Amazon S3, instead of
> their own servers?  Alternatively something like globusonline<https://www.globusonline.org/>with its terabyte/hr transfer rates could help to approach big data the old
> fashion way (as used by the US DOE supercomputing centers to share data).
>
> I've just started exploring thedatahub.org myself.  The API (essential if
> I am to be able to incorporate this into my workflow) looks great but is
> giving me a few challenges.  AmazonS3 has the advantage of a much wider
> developer community building tools to interact with its data storage,
> though somewhat of a disadvantage in cost...  I'd be curious to here the
> experiences of others...
>
>
> - Carl
>
>
> On Wed, Apr 11, 2012 at 11:03 AM, Jessy Kate Schingler <
> jessy at jessykate.com> wrote:
>
>> to play devil's advocate :)
>>
>> i think sites like github and wordpress and all the other defacto hosted
>> tools are successful specifically *because* they cross community
>> boundaries, and as a result encourage cross pollination and collaboration,
>> and focus efforts and (human/dollar) support. if there were are data hubs
>> for each possible community, then i'm worried we just end up with
>> fragmentation of efforts, and confusion on the part of the user ("gee, do i
>> post to the CS data hub or the web development data hub? oh whatever i'll
>> just do it later.").
>>
>> on the other hand, as scientists posting to sites like thedatahub, we
>> actually increase exposure to our data and probability of re-use/re-mixing,
>> and hopefully help to dispel the notion that there is anything mysterious
>> or special about "real" scientists' data. we're right in there with the
>> data nerds and the software developers and the database admins and the non
>> profits and the inter-governmentals sharing and refining and asking
>> questions about their data. seems better for us all...
>>
>> to be clear, if we need more human resources to support operations or
>> scale up what thedatahub.org is capable of handling, i think we should
>> definitely do that and am happy to help on the sysadmin side, but IMHO we
>> would reap greater rewards by creating a defacto place on the web that does
>> this well for all, than by setting up a separate community.
>>
>> my 2c!
>> jessy
>>
>>
>> On Wed, Apr 11, 2012 at 7:48 AM, Peter Murray-Rust <pm286 at cam.ac.uk>wrote:
>>
>>>
>>>
>>> On Wed, Apr 11, 2012 at 7:45 AM, Jessy Kate Schingler <
>>> jessy at jessykate.com> wrote:
>>>
>>>> do people think a separate instance of ckan would be useful for the
>>>> open data/science community at large? or is it an issue of marketing what
>>>> we have (thedatahub) better?
>>>>
>>>> if the former, i'm happy to help w system administration, but it's not
>>>> obvious to me... curious what others think!
>>>>
>>>>
>>> I think we should have a separate science-datahub.. I showed datahub to
>>> the European Horizon2020 today - very briefly..
>>>
>>>
>>>> jessy
>>>>
>>>>
>>>> On Tue, Apr 10, 2012 at 1:25 AM, Mark Wainwright <
>>>> mark.wainwright at okfn.org> wrote:
>>>>
>>>>> Yes indeed! Perhaps I could mention this submission that I threw
>>>>> together for the Open Repositories conference OR12
>>>>> (http://or2012.ed.ac.uk):
>>>>>
>>>>> http://ckan.okfnpad.org/or12
>>>>>
>>>>> My idea was that we could boot a new instance of ckan specialised for
>>>>> research papers (slightly facetiously called thepaperhub.org), but I
>>>>> don't know how easy this is, or whether there would be enthusiasm from
>>>>> someone technically literate to keep it running. (Volunteers?)
>>>>> Meantime thedatahub.org is a good option.
>>>>>
>>>>> I gather OR12 will be accepting/rejecting submissions on 16 April,
>>>>> incidentally.
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>> On 2 April 2012 20:01, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>>>>>
>>>>> > On Mon, Apr 2, 2012 at 7:23 PM, Jessy Kate Schingler <
>>>>> jessy at jessykate.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> i agree on the dataforge front...  git doesn't handle large files
>>>>> well,
>>>>> >> and figshare, buzzdata etc. seem to be mostly for visual or tabular
>>>>> data
>>>>> >> sets. out of curiosity, as i'm starting to learn about
>>>>> thedatahub.com,
>>>>> >
>>>>> >
>>>>> > thedatahub.org I think
>>>>> >
>>>>> >>
>>>>> >> it seems rather perfect for data set management, and even has a
>>>>> change
>>>>> >> lists for data sets, groups, user pages, etc. (especially if there
>>>>> were some
>>>>> >> command line tools so i could "commit" changes to my data set
>>>>> periodically
>>>>> >> and upload them :)).
>>>>> >>
>>>>> >> is there a reason people find ckan/thedatahub insufficient for data
>>>>> >> management needs? is it related to technical/features, or to
>>>>> peoples'
>>>>> >> familiarity and confidence around the longevity of the site?
>>>>> >
>>>>> >
>>>>> > It's history, I think. We should now be making the case for such a
>>>>> > repository and I don't think Figshare is it. I have rather negelected
>>>>> > datahub because the original CKAN was metadata-oriented.
>>>>> >
>>>>> > I'll be making the case in Europe next week that we badly need
>>>>> informal
>>>>> > repositories and maybe this is the time to push the datahub?
>>>>> >
>>>>> > P.
>>>>> >
>>>>> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Mon, Apr 2, 2012 at 12:05 AM, Peter Murray-Rust <pm286 at cam.ac.uk
>>>>> >
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Tom,
>>>>> >>> This is a really valuable post. I feel your concerns directly. I
>>>>> have
>>>>> >>> copied in our new Panton fellows (though I am sure they read this
>>>>> list
>>>>> >>> anyway!)
>>>>> >>>
>>>>> >>> On Sun, Apr 1, 2012 at 11:16 PM, Tom Roche <Tom_Roche at pobox.com>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> [apologies for length of post, but it's a big topic]
>>>>> >>>
>>>>> >>>
>>>>> >>> No apologies needed!
>>>>> >>>
>>>>> >>> I am giving an important presentation to  Europe "Open
>>>>> Infrastructures
>>>>> >>> for Open Science" and Neelie Kroes and others will be there. I am
>>>>> getting my
>>>>> >>> thoughts together as I have to give the plenary that informs the
>>>>> rest of the
>>>>> >>> workshop. Currently my thoughts are:
>>>>> >>>
>>>>> >>> Europe (and the world) is losing 10 billion + in unused and
>>>>> restricted
>>>>> >>> data. (I said this to Hargreaves)
>>>>> >>> We MUST have easily accessible research repositories, probably on a
>>>>> >>> domain basis (Dryad, Pangaea, TARDIS, etc.)
>>>>> >>> Institutional Repos do not work for STM and never will
>>>>> >>> Mandates are a blunt weapon and so far have little effectiveness
>>>>> >>> Non-Commercial destroys knowledge
>>>>> >>>
>>>>> >>> We must give the researchers something they want. Sourceforge does
>>>>> this
>>>>> >>> for code. I use Sourceforge (actually now Bitbucket and Github)
>>>>> several
>>>>> >>> times a day. All my code is backed up, shareable, reusable,
>>>>> validated etc.
>>>>> >>>
>>>>> >>> There must be a "Data forge" for Europe. Figshare was built by one
>>>>> >>> graduate student in one year. I would give 3rd year graduate
>>>>> students
>>>>> >>> funding to do this - it's a hundred times more cost effective than
>>>>> >>> repositories.
>>>>> >>>
>>>>> >>> I'd like to collect ideas on this llist and present them next week
>>>>> >>> (11th). An OKF data manifesto for Open Science (in Europe) Who
>>>>> knows what
>>>>> >>> might come?
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>> --
>>>>> >>> Peter Murray-Rust
>>>>> >>> Reader in Molecular Informatics
>>>>> >>> Unilever Centre, Dep. Of Chemistry
>>>>> >>> University of Cambridge
>>>>> >>> CB2 1EW, UK
>>>>> >>> +44-1223-763069
>>>>> >>>
>>>>> >>> _______________________________________________
>>>>> >>> open-science mailing list
>>>>> >>> open-science at lists.okfn.org
>>>>> >>> http://lists.okfn.org/mailman/listinfo/open-science
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Jessy
>>>>> >> http://jessykate.com
>>>>> >>
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Peter Murray-Rust
>>>>> > Reader in Molecular Informatics
>>>>> > Unilever Centre, Dep. Of Chemistry
>>>>> > University of Cambridge
>>>>> > CB2 1EW, UK
>>>>> > +44-1223-763069
>>>>> >
>>>>> > _______________________________________________
>>>>> > open-science mailing list
>>>>> > open-science at lists.okfn.org
>>>>> > http://lists.okfn.org/mailman/listinfo/open-science
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Mark Wainwright, CKAN Community Co-ordinator
>>>>> Open Knowledge Foundation http://okfn.org/
>>>>> Skype: m.wainwright
>>>>>
>>>>> _______________________________________________
>>>>> open-science mailing list
>>>>> open-science at lists.okfn.org
>>>>> http://lists.okfn.org/mailman/listinfo/open-science
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jessy
>>>> http://jessykate.com
>>>>
>>>>
>>>> _______________________________________________
>>>> open-science mailing list
>>>> open-science at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/open-science
>>>>
>>>>
>>>
>>>
>>> --
>>> Peter Murray-Rust
>>> Reader in Molecular Informatics
>>> Unilever Centre, Dep. Of Chemistry
>>> University of Cambridge
>>> CB2 1EW, UK
>>> +44-1223-763069
>>>
>>
>>
>>
>> --
>> Jessy
>> http://jessykate.com
>>
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>>
>
>
> --
> Carl Boettiger
> UC Davis
> http://www.carlboettiger.info/
>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20120411/2567ee39/attachment-0001.html>


More information about the open-science mailing list