[open-science] Fwd: [Open-access] Open Data Access Point in R

Mon Dec 23 19:15:43 UTC 2013

Hi Christian,
This is a fascinating project. Do you have any other publications or
documents about it?

Have you heard of Attempto controlled english (ACE) project? It is a
knowledge representation language that is both human readable and machine
readable. It could be a useful example for you term thesaurus. It converts
into OWL statments, which is a subset of RDF.

I couldn't do much on the subject.ro site. Is there some documentation
about how to use this site? The help link doesn't do anything.

Good luck with this project.

Ted Strauss

Trudat.co<https://app.getsignals.com/link?url=http%3A%2F%2Ftrudat.co%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIDmnK4KDA&k=c619ecb0-d242-4b42-ca2a-2f6b4fc346d2>
odx.io<https://app.getsignals.com/link?url=http%3A%2F%2Fodx.io%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIDmnK4KDA&k=e3256706-c9ed-4cb8-c162-233ac41395d6>
@trudatted<https://app.getsignals.com/link?url=https%3A%2F%2Ftwitter.com%2Ftrudatted&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIDmnK4KDA&k=8374d4c8-5452-4157-b0e1-e7df110d04a8>

On Mon, Dec 23, 2013 at 12:45 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:

>
>
> ---------- Forwarded message ----------
> From: Christian Tzurcanu <christian.tzurcanu at gmail.com>
> Date: Mon, Dec 23, 2013 at 5:13 PM
> Subject: [Open-access] Open Data Access Point in R
> To: open-access at lists.okfn.org
>
>
> Dear members,
>
> My proposal would be to have a thesaurus of navigation for an open data
> catalog in multiple languages so I can plug it into
> http://subject.ro/index.php?uri=uat.rdf (as a uri)
> That way we can index data, messages/comments about the data, and offer
> metadata back into R.
>
> 1. Why bring open data to R as a priority?
> Because R has a very extensive library of algorithms and behavior that
> complements any data.
>
> 2. Why subject.ro? What is subject.ro / what it plans to be?
> We want to make it a general ontology based on the SKOS Thesaurus format.
> For very easy human-led categorization as well as machine-readable.
> We plan it as a gateway to liked and open data / indexed by subject. On
> the web as well as in R.
> our demo is for the ontology of Astronomy on their draft thesaurus (not
> completed with all URIs) - so expect bugs from this side..
>
> 3. We would first begin with subject.ro data and behavior as a "portal"
> to linked data in R because that will bring in R qualitative dimensions (by
> controlled vocabularies). R is presently very good for quantitative data
> but lacks ability to compute semantics.
>
> 4. subject.ro will keep it's data available for replication and
> synchronization (for now in mysql, but with plans for CouchDB). We will
> have mobile, desktop apps for interfacing with this data as well as in R
> and the website. CouchDB is very good for distributed db.
> We have extensive experience in programming for all platforms: web,
> mobile, desktop for all operating systems. But we will need more volunteer
> programmers for faster returns :)
>
>
> Now I would like to talk about what we have already done to see the link
> to our plans for subject.ro :
> in http://sliced.ro/docs/docs/Science.html we have demoed some things
> that we would like in R:
>
> for each thesaurus we propose:
> -have all terms at singular as number and masculine as gender (for the
> appropriate languages)
> -have the least number of words per term (prefer hyphenated and composed
> word)
> -have only the eponyms capitalized
> -prefer the same number of words per term as the English term
> -prefer to include in the term part of the inheritance (a term should
> uniquely-define the reality without the need to know it's ancestry in the
> graph) or have 2 versions of the thesauri: one with intrinsic identity and
> one with possible extrinsic identity
> -there should be just one preferred term for each language
> -we also have to know for each term if it has single or multiple
> inheritance
>
> We should talk about each rule and I will tell you why I have reached
> these conclusions. They are not the only possible solution.
>
> Each language should have a function with the ability to form term's
> plural and feminine forms.
> There should be a function that takes in a text and a language code and be
> able to compile a list of terms it contains.
> There should be a function that takes in a text, a language and a target
> language. It will return an exact translation for controlled terms and an
> approximate translation of the rest using Google Translate.
>
> As for Semantic Web processing: For any text+language:
> There should be a function that returns the greater common term: the term
> that contains all the other mentioned terms.
> There should be a function that returns the smallest distinctors (an
> invented idea): the terms that are the most detailed (the leaves in the
> thesaurus graph)
>
> Thesauri data and all these functions should be available in R (in the
> subject.ro package).
>
>
> We need scientific guidance on where this technology should lead and what
> usecases can be derived. Please feedback.
>
> Christian Tzurcanu, subject.ro
>
>
>
> _______________________________________________
> open-access mailing list
> open-access at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-access
> Unsubscribe: https://lists.okfn.org/mailman/options/open-access
>
>
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-science
> Unsubscribe: https://lists.okfn.org/mailman/options/open-science
>
>

-- 
Ted Strauss
Co-founder of Trudat.co <http://trudat.co/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20131223/7b059b3a/attachment-0003.html>