[open-science] Fwd: [Open-access] Open Data Access Point in R

Ted Strauss ted at trudat.co
Mon Dec 23 19:18:08 UTC 2013

I forgot to include the link about the Attempto controlled english project:
The ACE wiki is a great example application:

> This is a fascinating project. Do you have any other publications or
> documents about it?
> Have you heard of Attempto controlled english (ACE) project? It is a
> knowledge representation language that is both human readable and machine
> readable. It could be a useful example for you term thesaurus. It converts
> into OWL statments, which is a subset of RDF.
> I couldn't do much on the subject.ro site. Is there some documentation
> about how to use this site? The help link doesn't do anything.
> Good luck with this project.
> Ted Strauss
> Trudat.co<https://app.getsignals.com/link?url=http%3A%2F%2Ftrudat.co%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIDmnK4KDA&k=c619ecb0-d242-4b42-ca2a-2f6b4fc346d2>
> odx.io<https://app.getsignals.com/link?url=http%3A%2F%2Fodx.io%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIDmnK4KDA&k=e3256706-c9ed-4cb8-c162-233ac41395d6>
> @trudatted<https://app.getsignals.com/link?url=https%3A%2F%2Ftwitter.com%2Ftrudatted&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIDmnK4KDA&k=8374d4c8-5452-4157-b0e1-e7df110d04a8>
>> From: Christian Tzurcanu <christian.tzurcanu at gmail.com>
>> Date: Mon, Dec 23, 2013 at 5:13 PM
>> Subject: [Open-access] Open Data Access Point in R
>> To: open-access at lists.okfn.org
>> Dear members,
>> My proposal would be to have a thesaurus of navigation for an open data
>> catalog in multiple languages so I can plug it into
>> http://subject.ro/index.php?uri=uat.rdf (as a uri)
>> That way we can index data, messages/comments about the data, and offer
>> metadata back into R.
>> 1. Why bring open data to R as a priority?
>> Because R has a very extensive library of algorithms and behavior that
>> complements any data.
>> 2. Why subject.ro? What is subject.ro / what it plans to be?
>> We want to make it a general ontology based on the SKOS Thesaurus format.
>> For very easy human-led categorization as well as machine-readable.
>> We plan it as a gateway to liked and open data / indexed by subject. On
>> the web as well as in R.
>> our demo is for the ontology of Astronomy on their draft thesaurus (not
>> completed with all URIs) - so expect bugs from this side..
>> 3. We would first begin with subject.ro data and behavior as a "portal"
>> to linked data in R because that will bring in R qualitative dimensions (by
>> controlled vocabularies). R is presently very good for quantitative data
>> but lacks ability to compute semantics.
>> 4. subject.ro will keep it's data available for replication and
>> synchronization (for now in mysql, but with plans for CouchDB). We will
>> have mobile, desktop apps for interfacing with this data as well as in R
>> and the website. CouchDB is very good for distributed db.
>> We have extensive experience in programming for all platforms: web,
>> mobile, desktop for all operating systems. But we will need more volunteer
>> programmers for faster returns :)
>> Now I would like to talk about what we have already done to see the link
>> to our plans for subject.ro :
>> in http://sliced.ro/docs/docs/Science.html we have demoed some things
>> that we would like in R:
>> for each thesaurus we propose:
>> -have all terms at singular as number and masculine as gender (for the
>> appropriate languages)
>> -have the least number of words per term (prefer hyphenated and composed
>> word)
>> -have only the eponyms capitalized
>> -prefer the same number of words per term as the English term
>> -prefer to include in the term part of the inheritance (a term should
>> uniquely-define the reality without the need to know it's ancestry in the
>> graph) or have 2 versions of the thesauri: one with intrinsic identity and
>> one with possible extrinsic identity
>> -there should be just one preferred term for each language
>> -we also have to know for each term if it has single or multiple
>> inheritance
>> We should talk about each rule and I will tell you why I have reached
>> these conclusions. They are not the only possible solution.
>> Each language should have a function with the ability to form term's
>> plural and feminine forms.
>> There should be a function that takes in a text and a language code and
>> be able to compile a list of terms it contains.
>> There should be a function that takes in a text, a language and a target
>> language. It will return an exact translation for controlled terms and an
>> approximate translation of the rest using Google Translate.
>> As for Semantic Web processing: For any text+language:
>> There should be a function that returns the greater common term: the term
>> that contains all the other mentioned terms.
>> There should be a function that returns the smallest distinctors (an
>> invented idea): the terms that are the most detailed (the leaves in the
>> thesaurus graph)
>> Thesauri data and all these functions should be available in R (in the
>> subject.ro package).
>> We need scientific guidance on where this technology should lead and what
>> usecases can be derived. Please feedback.
>> Christian Tzurcanu, subject.ro
Ted Strauss
Co-founder of Trudat.co <http://trudat.co/>
