[open-science] github/R stack for the nomadic researcher

Mon Apr 16 09:40:10 UTC 2012

Incidentally, though I thought an instance specialised for research
publications might be helpful, I wouldn't particularly suggest
fragmenting it by discipline (science, etc).

Carl, glad that your first impressions of the CKAN API are good. If it
is also causing you problems you might want to report/discuss them on
the CKAN lists:

http://lists.okfn.org/mailman/listinfo/ckan-dev
http://lists.okfn.org/mailman/listinfo/ckan-discuss

For technical questions/problems, the -dev list is best.

Regards,

Mark

On 11 April 2012 19:40, Carl Boettiger <cboettig at gmail.com> wrote:
> To carry this thread a step further...
>
> I suspect most researchers will be apt to use something if they can be
> convinced that it's already accepted for use by the scientific community --
> whether or not it's specifically designed for scientists.  I'd agree with
> Jessy -- the internet is littered with projects that have tried to be a
> "facebook for scientists" a "github for scientists" etc, which are largely
> unsuccessful when the impression is that "no researcher I know & respect
> uses that". Meanwhile, I've seen many skeptics go from "know self-respecting
> researcher would use twitter" to "look at all these well-respected
> researchers using twitter!  I better get on the boat"
>
> I believe that having funders and journals visibly support, recommend, or
> require the use of a particular repository is also essential.  This has
> certainly helped repositories such as Dryad and figshare.  Meanwhile,
> alternate solutions will have to distinguish themselves from other players
> in this field, and the different repositories need to interact seemlessly so
> researchers don't have to worry about which one to submit to.  The DataONE
> project is probably a leading example of these efforts.
>
> So, perhaps we don't need a thedatahub.org for science, perhaps we do.  What
> we really need, though, is established researchers using the service and
> funders and publishers recommending or requiring it.
>
> The Amazon S3 business mentioned on another recent thread is probably an
> important as well, at least for very large datasets.  With examples like
> Titus's, or more visibly, NIH's recent announcement of freely hosting the
> 1000 Genomes dataset there (200 terabytes), this seems like an important
> player.  Perhaps thedatahub.org could simply provide the option for data to
> post to Amazon S3, instead of their own servers?  Alternatively something
> like globusonline with its terabyte/hr transfer rates could help to approach
> big data the old fashion way (as used by the US DOE supercomputing centers
> to share data).
>
> I've just started exploring thedatahub.org myself.  The API (essential if I
> am to be able to incorporate this into my workflow) looks great but is
> giving me a few challenges.  AmazonS3 has the advantage of a much wider
> developer community building tools to interact with its data storage, though
> somewhat of a disadvantage in cost...  I'd be curious to here the
> experiences of others...
>
>
> - Carl
>
>
> On Wed, Apr 11, 2012 at 11:03 AM, Jessy Kate Schingler <jessy at jessykate.com>
> wrote:
>>
>> to play devil's advocate :)
>>
>> i think sites like github and wordpress and all the other defacto hosted
>> tools are successful specifically *because* they cross community boundaries,
>> and as a result encourage cross pollination and collaboration, and focus
>> efforts and (human/dollar) support. if there were are data hubs for each
>> possible community, then i'm worried we just end up with fragmentation of
>> efforts, and confusion on the part of the user ("gee, do i post to the CS
>> data hub or the web development data hub? oh whatever i'll just do it
>> later.").
>>
>> on the other hand, as scientists posting to sites like thedatahub, we
>> actually increase exposure to our data and probability of re-use/re-mixing,
>> and hopefully help to dispel the notion that there is anything mysterious or
>> special about "real" scientists' data. we're right in there with the data
>> nerds and the software developers and the database admins and the non
>> profits and the inter-governmentals sharing and refining and asking
>> questions about their data. seems better for us all...
>>
>> to be clear, if we need more human resources to support operations or
>> scale up what thedatahub.org is capable of handling, i think we should
>> definitely do that and am happy to help on the sysadmin side, but IMHO we
>> would reap greater rewards by creating a defacto place on the web that does
>> this well for all, than by setting up a separate community.
>>
>> my 2c!
>> jessy
>>
>>
>> On Wed, Apr 11, 2012 at 7:48 AM, Peter Murray-Rust <pm286 at cam.ac.uk>
>> wrote:
>>>
>>>
>>>
>>> On Wed, Apr 11, 2012 at 7:45 AM, Jessy Kate Schingler
>>> <jessy at jessykate.com> wrote:
>>>>
>>>> do people think a separate instance of ckan would be useful for the open
>>>> data/science community at large? or is it an issue of marketing what we have
>>>> (thedatahub) better?
>>>>
>>>> if the former, i'm happy to help w system administration, but it's not
>>>> obvious to me... curious what others think!
>>>>
>>>
>>> I think we should have a separate science-datahub.. I showed datahub to
>>> the European Horizon2020 today - very briefly..
>>>
>>>>
>>>> jessy
>>>>
>>>>
>>>> On Tue, Apr 10, 2012 at 1:25 AM, Mark Wainwright
>>>> <mark.wainwright at okfn.org> wrote:
>>>>>
>>>>> Yes indeed! Perhaps I could mention this submission that I threw
>>>>> together for the Open Repositories conference OR12
>>>>> (http://or2012.ed.ac.uk):
>>>>>
>>>>> http://ckan.okfnpad.org/or12
>>>>>
>>>>> My idea was that we could boot a new instance of ckan specialised for
>>>>> research papers (slightly facetiously called thepaperhub.org), but I
>>>>> don't know how easy this is, or whether there would be enthusiasm from
>>>>> someone technically literate to keep it running. (Volunteers?)
>>>>> Meantime thedatahub.org is a good option.
>>>>>
>>>>> I gather OR12 will be accepting/rejecting submissions on 16 April,
>>>>> incidentally.
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>> On 2 April 2012 20:01, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>>>>>
>>>>> > On Mon, Apr 2, 2012 at 7:23 PM, Jessy Kate Schingler
>>>>> > <jessy at jessykate.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> i agree on the dataforge front...  git doesn't handle large files
>>>>> >> well,
>>>>> >> and figshare, buzzdata etc. seem to be mostly for visual or tabular
>>>>> >> data
>>>>> >> sets. out of curiosity, as i'm starting to learn about
>>>>> >> thedatahub.com,
>>>>> >
>>>>> >
>>>>> > thedatahub.org I think
>>>>> >
>>>>> >>
>>>>> >> it seems rather perfect for data set management, and even has a
>>>>> >> change
>>>>> >> lists for data sets, groups, user pages, etc. (especially if there
>>>>> >> were some
>>>>> >> command line tools so i could "commit" changes to my data set
>>>>> >> periodically
>>>>> >> and upload them :)).
>>>>> >>
>>>>> >> is there a reason people find ckan/thedatahub insufficient for data
>>>>> >> management needs? is it related to technical/features, or to
>>>>> >> peoples'
>>>>> >> familiarity and confidence around the longevity of the site?
>>>>> >
>>>>> >
>>>>> > It's history, I think. We should now be making the case for such a
>>>>> > repository and I don't think Figshare is it. I have rather negelected
>>>>> > datahub because the original CKAN was metadata-oriented.
>>>>> >
>>>>> > I'll be making the case in Europe next week that we badly need
>>>>> > informal
>>>>> > repositories and maybe this is the time to push the datahub?
>>>>> >
>>>>> > P.
>>>>> >
>>>>> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Mon, Apr 2, 2012 at 12:05 AM, Peter Murray-Rust <pm286 at cam.ac.uk>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Tom,
>>>>> >>> This is a really valuable post. I feel your concerns directly. I
>>>>> >>> have
>>>>> >>> copied in our new Panton fellows (though I am sure they read this
>>>>> >>> list
>>>>> >>> anyway!)
>>>>> >>>
>>>>> >>> On Sun, Apr 1, 2012 at 11:16 PM, Tom Roche <Tom_Roche at pobox.com>
>>>>> >>> wrote:
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> [apologies for length of post, but it's a big topic]
>>>>> >>>
>>>>> >>>
>>>>> >>> No apologies needed!
>>>>> >>>
>>>>> >>> I am giving an important presentation to  Europe "Open
>>>>> >>> Infrastructures
>>>>> >>> for Open Science" and Neelie Kroes and others will be there. I am
>>>>> >>> getting my
>>>>> >>> thoughts together as I have to give the plenary that informs the
>>>>> >>> rest of the
>>>>> >>> workshop. Currently my thoughts are:
>>>>> >>>
>>>>> >>> Europe (and the world) is losing 10 billion + in unused and
>>>>> >>> restricted
>>>>> >>> data. (I said this to Hargreaves)
>>>>> >>> We MUST have easily accessible research repositories, probably on a
>>>>> >>> domain basis (Dryad, Pangaea, TARDIS, etc.)
>>>>> >>> Institutional Repos do not work for STM and never will
>>>>> >>> Mandates are a blunt weapon and so far have little effectiveness
>>>>> >>> Non-Commercial destroys knowledge
>>>>> >>>
>>>>> >>> We must give the researchers something they want. Sourceforge does
>>>>> >>> this
>>>>> >>> for code. I use Sourceforge (actually now Bitbucket and Github)
>>>>> >>> several
>>>>> >>> times a day. All my code is backed up, shareable, reusable,
>>>>> >>> validated etc.
>>>>> >>>
>>>>> >>> There must be a "Data forge" for Europe. Figshare was built by one
>>>>> >>> graduate student in one year. I would give 3rd year graduate
>>>>> >>> students
>>>>> >>> funding to do this - it's a hundred times more cost effective than
>>>>> >>> repositories.
>>>>> >>>
>>>>> >>> I'd like to collect ideas on this llist and present them next week
>>>>> >>> (11th). An OKF data manifesto for Open Science (in Europe) Who
>>>>> >>> knows what
>>>>> >>> might come?
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>> --
>>>>> >>> Peter Murray-Rust
>>>>> >>> Reader in Molecular Informatics
>>>>> >>> Unilever Centre, Dep. Of Chemistry
>>>>> >>> University of Cambridge
>>>>> >>> CB2 1EW, UK
>>>>> >>> +44-1223-763069
>>>>> >>>
>>>>> >>> _______________________________________________
>>>>> >>> open-science mailing list
>>>>> >>> open-science at lists.okfn.org
>>>>> >>> http://lists.okfn.org/mailman/listinfo/open-science
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Jessy
>>>>> >> http://jessykate.com
>>>>> >>
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Peter Murray-Rust
>>>>> > Reader in Molecular Informatics
>>>>> > Unilever Centre, Dep. Of Chemistry
>>>>> > University of Cambridge
>>>>> > CB2 1EW, UK
>>>>> > +44-1223-763069
>>>>> >
>>>>> > _______________________________________________
>>>>> > open-science mailing list
>>>>> > open-science at lists.okfn.org
>>>>> > http://lists.okfn.org/mailman/listinfo/open-science
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Mark Wainwright, CKAN Community Co-ordinator
>>>>> Open Knowledge Foundation http://okfn.org/
>>>>> Skype: m.wainwright
>>>>>
>>>>> _______________________________________________
>>>>> open-science mailing list
>>>>> open-science at lists.okfn.org
>>>>> http://lists.okfn.org/mailman/listinfo/open-science
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jessy
>>>> http://jessykate.com
>>>>
>>>>
>>>> _______________________________________________
>>>> open-science mailing list
>>>> open-science at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/open-science
>>>>
>>>
>>>
>>>
>>> --
>>> Peter Murray-Rust
>>> Reader in Molecular Informatics
>>> Unilever Centre, Dep. Of Chemistry
>>> University of Cambridge
>>> CB2 1EW, UK
>>> +44-1223-763069
>>
>>
>>
>>
>> --
>> Jessy
>> http://jessykate.com
>>
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>
>
>
> --
> Carl Boettiger
> UC Davis
> http://www.carlboettiger.info/
>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>

-- 
Mark Wainwright, CKAN Community Co-ordinator
Open Knowledge Foundation http://okfn.org/
Skype: m.wainwright