[open-science] OKF tools: ckan.org, thedatahub.org
Peter Murray-Rust
pm286 at cam.ac.uk
Wed Apr 4 07:10:45 UTC 2012
On Wed, Apr 4, 2012 at 2:56 AM, Carl Boettiger <cboettig at gmail.com> wrote:
> Peter is certainly right about the huge & growing demand for data
> repositories. Forgive the cut & paste below, but for US based among us at
> least, the NSF has just put out a call<http://www.nsf.gov/pubs/2012/nsf12499/nsf12499.htm?WT.mc_id=USNSF_179>for small ($250K) and medium ($1M) grants to do something like this. In my
> opinion it's not a bad checklist. Of course as this thread has already
> highlighted, half of the battle is just knowing what's already out there
> and getting it adopted...
>
> Great. Presumably these have to be held by a US PI but there is no reason
why the OKF should not contract to the PI. Carl - are you planning to
apply?
I think of companies likeKitware - an Open Source company for whom I have
great respect. They are the sort of org that I think the NSF might be
thinking about. But I think we should also try to identify other US
contacts. My comments on the material below
>
> 1. * E-science collaboration environments (ESCE).* A comprehensive
> "big data" cyberinfrastructure is necessary to allow for broad communities
> of scientists and engineers to have access to diverse data and to the best
> and most usable inferential and visualization tools. Potential research
> areas include, but are not limited to:
>
>
> - Novel collaboration environments for diverse and distant groups of
> researchers and students to coordinate their work (e.g., through data and
> model sharing and software reuse, tele-presence capability, crowdsourcing,
> social networking capabilities) with greatly enhanced efficiency and
> effectiveness for the scientific collaboration;
>
> OKF leads the world here. BTW for those of you not in "grid-based"
computing there is a huge need for social networking. I went to India for
EU-India grid and wrote a position paper about how it should be a social as
well as a technical grid
>
> - Automation of the discovery process (e.g., through machine learning,
> data mining, and automated inference);
>
> Metadata => CKAN.
>
> -
> - Automated modeling tools to provide multiple views of massive data
> sets that are useful to diverse disciplines;
>
> I am more concerned with the views on massive numbers of medium size data
sets
>
> - New data curation techniques for managing the complex and large flow
> of scientific output in a multitude of disciplines;
>
> Possibly. IMO the problems are social first and technical second
>
> - Development of systems and processes that efficiently incorporate
> autonomous anomaly and trend detection with human interaction, response,
> and reaction;
>
> Yes. I did something on a very small scale with high-throughput
computational chemistry
>
> - End-to-end systems that facilitate the development and use of
> scientific workflows and new applications;
>
> This has been a rathole for many years. Tom Oinn and I have worked on
Taverna - the Mancester eScience workflow engine. I've developed my own for
chemistry. But there is no one-size-fits-all
>
> - New approaches to development of research questions that might be
> pursued in light of access to heterogeneous, diverse, big data;
>
> When we get the Open data there will be no shortage of people doing this!
>
> - New models for cross-disciplinary information fusion and knowledge
> sharing;
>
> This again comes down to metadata and social aspects
>
> - New approaches for effective data, knowledge, and model sharing and
> collaboration across multiple domains and disciplines;
>
> The first new approach is to recognize it needs funding for
sustainability!
>
> - Securing access to data using innovative techniques to prevent
> excessive replication of data to external entities;
>
> Yes. This is 95% social
>
> - Providing secure and controlled role-based access to centrally
> managed data environments;
>
> Expensive and normally only possible for the people with the data. I pass
on these issues
>
> - Remote operation, scheduling, and real-time access to distant
> instruments and data resources;
>
> This is quite well advanced. It's big science
>
> - Protection of privacy and maintenance of security in aggregated
> personal and proprietary data (e.g., de-identification);
>
> Pass
>
> - Generation of aggregated or summarized data sets for sharing and
> analyses across jurisdictional and other end user boundaries; and
>
> Again the OKF could tackle this
>
> - E-publishing tools that provide unique access, learning, and
> development opportunities.
>
>
>
And this is a major area. There is a desperate need for Open tools. But it
needs a lot of work
In Summary:
My own emphasis would be to stress the "long-tail" science. Big science
always gets the money. This should be a long-tail charter.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20120404/00352109/attachment-0001.html>
More information about the open-science
mailing list