[open-science] OKF tools: ckan.org, thedatahub.org

Wed Apr 4 07:10:45 UTC 2012

On Wed, Apr 4, 2012 at 2:56 AM, Carl Boettiger <cboettig at gmail.com> wrote:

> Peter is certainly right about the huge & growing demand for data
> repositories.  Forgive the cut & paste below, but for US based among us at
> least, the NSF has just put out a call<http://www.nsf.gov/pubs/2012/nsf12499/nsf12499.htm?WT.mc_id=USNSF_179>for small ($250K) and medium ($1M) grants to do something like this.  In my
> opinion it's not a bad checklist.  Of course as this thread has already
> highlighted, half of the battle is just knowing what's already out there
> and getting it adopted...
>
> Great. Presumably these have to be held by a US PI but there is no reason
why the OKF should not contract to the PI. Carl - are you planning to
apply?

I think of companies likeKitware - an Open Source company for whom I have
great respect. They are the sort of org that I think the NSF might be
thinking about. But I think we should also try to identify other US
contacts. My comments on the material below

>
>    1. * E-science collaboration environments (ESCE).* A comprehensive
>    "big data" cyberinfrastructure is necessary to allow for broad communities
>    of scientists and engineers to have access to diverse data and to the best
>    and most usable inferential and visualization tools. Potential research
>    areas include, but are not limited to:
>
>
>    - Novel collaboration environments for diverse and distant groups of
>    researchers and students to coordinate their work (e.g., through data and
>    model sharing and software reuse, tele-presence capability, crowdsourcing,
>    social networking capabilities) with greatly enhanced efficiency and
>    effectiveness for the scientific collaboration;
>
> OKF leads the world here. BTW for those of you not in "grid-based"
computing there is a huge need for social networking. I went to India for
EU-India grid and wrote a position paper about how it should be a social as
well as a technical grid

>
>    - Automation of the discovery process (e.g., through machine learning,
>    data mining, and automated inference);
>
>  Metadata => CKAN.

>
>    -
>    - Automated modeling tools to provide multiple views of massive data
>    sets that are useful to diverse disciplines;
>
> I am more concerned with the views on massive numbers of medium size data
sets

>
>    - New data curation techniques for managing the complex and large flow
>    of scientific output in a multitude of disciplines;
>
> Possibly. IMO the problems are social first and technical second

>
>    - Development of systems and processes that efficiently incorporate
>    autonomous anomaly and trend detection with human interaction, response,
>    and reaction;
>
> Yes. I did something on a very small scale with high-throughput
computational chemistry

>
>    - End-to-end systems that facilitate the development and use of
>    scientific workflows and new applications;
>
> This has been a rathole for many years. Tom Oinn and I have worked on
Taverna - the Mancester eScience workflow engine. I've developed my own for
chemistry. But there is no one-size-fits-all

>
>    - New approaches to development of research questions that might be
>    pursued in light of access to heterogeneous, diverse, big data;
>
> When we get the Open data there will be no shortage of people doing this!

>
>    - New models for cross-disciplinary information fusion and knowledge
>    sharing;
>
> This again comes down to metadata and social aspects

>
>    - New approaches for effective data, knowledge, and model sharing and
>    collaboration across multiple domains and disciplines;
>
> The first new approach is to recognize it needs funding for
sustainability!

>
>    - Securing access to data using innovative techniques to prevent
>    excessive replication of data to external entities;
>
> Yes. This is 95% social

>
>    - Providing secure and controlled role-based access to centrally
>    managed data environments;
>
> Expensive and normally only possible for the people with the data. I pass
on these issues

>
>    - Remote operation, scheduling, and real-time access to distant
>    instruments and data resources;
>
> This is quite well advanced. It's big science

>
>    - Protection of privacy and maintenance of security in aggregated
>    personal and proprietary data (e.g., de-identification);
>
> Pass

>
>    - Generation of aggregated or summarized data sets for sharing and
>    analyses across jurisdictional and other end user boundaries; and
>
> Again the OKF could tackle this

>
>    - E-publishing tools that provide unique access, learning, and
>    development opportunities.
>
>
>

And this is a major area. There is a desperate need for Open tools. But it
needs a lot of work

In Summary:
My own emphasis would be to stress the "long-tail" science. Big science
always gets the money. This should be a long-tail charter.

-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20120404/00352109/attachment-0001.html>