Peter Murray-Rust
Thu Dec 5 15:27:47 UTC 2013

Rafael Pezzi

>  Em 05-12-2013 06:18, Peter Murray-Rust escreveu:
>  My personal interests would include:
>  * problems of legacy data formats
>  * adding structure to semi-structured data
>  * indexing,particularly domain-specific information
> Peter,
> I would emphasize the need of open data formats and also open tools to
> play with these data as well.

Open data formats are not in doubt. But the absence of tools should not
hold us back. Much of this will be name-value pairs - even if the name is
not in an ontology it's useful. Thus:

species="Erithacus rubecula"

may not resolve against an RDF triple store but it's a lot better than zero.

> The repository will be of little or no value at all if the data is coded
> on a obscure format whose corresponding software is unavailable, out of
> market, or inviable expensive.

You misread me :
" problems of legacy data formats" did not mean I want to translate
something useful into obscure binary commercial legally protected DRM.
There are enough chemical software companies that do that. I want to
translate obscure binary protected formats into Unicode-compliant ASCII and
where possible indexed again known formal ontologies such as Chemical
Markup Language.

Improving the semantics of data will encourage people to build tools. They
are mutually dependent but we have to start somewhere.

> Here I point to the Science Code Manifesto<http://sciencecodemanifesto.org/>,
> which, I believe, must walk in hands with any data repository.
> Furthermore, reproducible research also needs reproducible instruments and
> experiments, thus I am particularly interested in the design of scientific
> instrumentation, including CAD drawings, schematics, firmwares,
> specifications, that must be accessed through open repositories as well.
I'd strongly support this - but it's a lot of design and a lot of
implementation. And we are often starting from scanned pixel pages or
broken PDF. Anything that takes those forward is IMO valuable.

Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
