[open-science] Fwd: [Open-access] Open Data Access Point in R
ted at trudat.co
Mon Dec 23 19:15:43 UTC 2013
This is a fascinating project. Do you have any other publications or
documents about it?
Have you heard of Attempto controlled english (ACE) project? It is a
knowledge representation language that is both human readable and machine
readable. It could be a useful example for you term thesaurus. It converts
into OWL statments, which is a subset of RDF.
I couldn't do much on the subject.ro site. Is there some documentation
about how to use this site? The help link doesn't do anything.
Good luck with this project.
On Mon, Dec 23, 2013 at 12:45 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
> ---------- Forwarded message ----------
> From: Christian Tzurcanu <christian.tzurcanu at gmail.com>
> Date: Mon, Dec 23, 2013 at 5:13 PM
> Subject: [Open-access] Open Data Access Point in R
> To: open-access at lists.okfn.org
> Dear members,
> My proposal would be to have a thesaurus of navigation for an open data
> catalog in multiple languages so I can plug it into
> http://subject.ro/index.php?uri=uat.rdf (as a uri)
> That way we can index data, messages/comments about the data, and offer
> metadata back into R.
> 1. Why bring open data to R as a priority?
> Because R has a very extensive library of algorithms and behavior that
> complements any data.
> 2. Why subject.ro? What is subject.ro / what it plans to be?
> We want to make it a general ontology based on the SKOS Thesaurus format.
> For very easy human-led categorization as well as machine-readable.
> We plan it as a gateway to liked and open data / indexed by subject. On
> the web as well as in R.
> our demo is for the ontology of Astronomy on their draft thesaurus (not
> completed with all URIs) - so expect bugs from this side..
> 3. We would first begin with subject.ro data and behavior as a "portal"
> to linked data in R because that will bring in R qualitative dimensions (by
> controlled vocabularies). R is presently very good for quantitative data
> but lacks ability to compute semantics.
> 4. subject.ro will keep it's data available for replication and
> synchronization (for now in mysql, but with plans for CouchDB). We will
> have mobile, desktop apps for interfacing with this data as well as in R
> and the website. CouchDB is very good for distributed db.
> We have extensive experience in programming for all platforms: web,
> mobile, desktop for all operating systems. But we will need more volunteer
> programmers for faster returns :)
> Now I would like to talk about what we have already done to see the link
> to our plans for subject.ro :
> in http://sliced.ro/docs/docs/Science.html we have demoed some things
> that we would like in R:
> for each thesaurus we propose:
> -have all terms at singular as number and masculine as gender (for the
> appropriate languages)
> -have the least number of words per term (prefer hyphenated and composed
> -have only the eponyms capitalized
> -prefer the same number of words per term as the English term
> -prefer to include in the term part of the inheritance (a term should
> uniquely-define the reality without the need to know it's ancestry in the
> graph) or have 2 versions of the thesauri: one with intrinsic identity and
> one with possible extrinsic identity
> -there should be just one preferred term for each language
> -we also have to know for each term if it has single or multiple
> We should talk about each rule and I will tell you why I have reached
> these conclusions. They are not the only possible solution.
> Each language should have a function with the ability to form term's
> plural and feminine forms.
> There should be a function that takes in a text and a language code and be
> able to compile a list of terms it contains.
> There should be a function that takes in a text, a language and a target
> language. It will return an exact translation for controlled terms and an
> approximate translation of the rest using Google Translate.
> As for Semantic Web processing: For any text+language:
> There should be a function that returns the greater common term: the term
> that contains all the other mentioned terms.
> There should be a function that returns the smallest distinctors (an
> invented idea): the terms that are the most detailed (the leaves in the
> thesaurus graph)
> Thesauri data and all these functions should be available in R (in the
> subject.ro package).
> We need scientific guidance on where this technology should lead and what
> usecases can be derived. Please feedback.
> Christian Tzurcanu, subject.ro
> open-access mailing list
> open-access at lists.okfn.org
> Unsubscribe: https://lists.okfn.org/mailman/options/open-access
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> open-science mailing list
> open-science at lists.okfn.org
> Unsubscribe: https://lists.okfn.org/mailman/options/open-science
Co-founder of Trudat.co <http://trudat.co/>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the open-science