[open-science] Fwd: [Open-access] Open Data Access Point in R

Ted Strauss ted at trudat.co
Mon Dec 23 19:18:08 UTC 2013


I forgot to include the link about the Attempto controlled english project:
http://attempto.ifi.uzh.ch/site/<https://app.getsignals.com/link?url=http%3A%2F%2Fattempto.ifi.uzh.ch%2Fsite%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIDmnK4KDA&k=68e1ccc8-9750-410d-d971-615c4d2c36ed>
The ACE wiki is a great example application:
http://attempto.ifi.uzh.ch/acewiki/<https://app.getsignals.com/link?url=http%3A%2F%2Fattempto.ifi.uzh.ch%2Facewiki%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIDmnK4KDA&k=0f0e0600-bd1e-48f6-ae72-83e4eb3e3715>


On Mon, Dec 23, 2013 at 2:15 PM, Ted Strauss <ted at trudat.co> wrote:

> Hi Christian,
> This is a fascinating project. Do you have any other publications or
> documents about it?
>
> Have you heard of Attempto controlled english (ACE) project? It is a
> knowledge representation language that is both human readable and machine
> readable. It could be a useful example for you term thesaurus. It converts
> into OWL statments, which is a subset of RDF.
>
> I couldn't do much on the subject.ro site. Is there some documentation
> about how to use this site? The help link doesn't do anything.
>
> Good luck with this project.
>
> Ted Strauss
>
> Trudat.co<https://app.getsignals.com/link?url=http%3A%2F%2Ftrudat.co%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIDmnK4KDA&k=c619ecb0-d242-4b42-ca2a-2f6b4fc346d2>
> odx.io<https://app.getsignals.com/link?url=http%3A%2F%2Fodx.io%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIDmnK4KDA&k=e3256706-c9ed-4cb8-c162-233ac41395d6>
> @trudatted<https://app.getsignals.com/link?url=https%3A%2F%2Ftwitter.com%2Ftrudatted&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIDmnK4KDA&k=8374d4c8-5452-4157-b0e1-e7df110d04a8>
>
>
>
>
> On Mon, Dec 23, 2013 at 12:45 PM, Peter Murray-Rust <pm286 at cam.ac.uk>wrote:
>
>>
>>
>> ---------- Forwarded message ----------
>> From: Christian Tzurcanu <christian.tzurcanu at gmail.com>
>> Date: Mon, Dec 23, 2013 at 5:13 PM
>> Subject: [Open-access] Open Data Access Point in R
>> To: open-access at lists.okfn.org
>>
>>
>> Dear members,
>>
>> My proposal would be to have a thesaurus of navigation for an open data
>> catalog in multiple languages so I can plug it into
>> http://subject.ro/index.php?uri=uat.rdf (as a uri)
>> That way we can index data, messages/comments about the data, and offer
>> metadata back into R.
>>
>> 1. Why bring open data to R as a priority?
>> Because R has a very extensive library of algorithms and behavior that
>> complements any data.
>>
>> 2. Why subject.ro? What is subject.ro / what it plans to be?
>> We want to make it a general ontology based on the SKOS Thesaurus format.
>> For very easy human-led categorization as well as machine-readable.
>> We plan it as a gateway to liked and open data / indexed by subject. On
>> the web as well as in R.
>> our demo is for the ontology of Astronomy on their draft thesaurus (not
>> completed with all URIs) - so expect bugs from this side..
>>
>> 3. We would first begin with subject.ro data and behavior as a "portal"
>> to linked data in R because that will bring in R qualitative dimensions (by
>> controlled vocabularies). R is presently very good for quantitative data
>> but lacks ability to compute semantics.
>>
>> 4. subject.ro will keep it's data available for replication and
>> synchronization (for now in mysql, but with plans for CouchDB). We will
>> have mobile, desktop apps for interfacing with this data as well as in R
>> and the website. CouchDB is very good for distributed db.
>> We have extensive experience in programming for all platforms: web,
>> mobile, desktop for all operating systems. But we will need more volunteer
>> programmers for faster returns :)
>>
>>
>> Now I would like to talk about what we have already done to see the link
>> to our plans for subject.ro :
>> in http://sliced.ro/docs/docs/Science.html we have demoed some things
>> that we would like in R:
>>
>> for each thesaurus we propose:
>> -have all terms at singular as number and masculine as gender (for the
>> appropriate languages)
>> -have the least number of words per term (prefer hyphenated and composed
>> word)
>> -have only the eponyms capitalized
>> -prefer the same number of words per term as the English term
>> -prefer to include in the term part of the inheritance (a term should
>> uniquely-define the reality without the need to know it's ancestry in the
>> graph) or have 2 versions of the thesauri: one with intrinsic identity and
>> one with possible extrinsic identity
>> -there should be just one preferred term for each language
>> -we also have to know for each term if it has single or multiple
>> inheritance
>>
>> We should talk about each rule and I will tell you why I have reached
>> these conclusions. They are not the only possible solution.
>>
>> Each language should have a function with the ability to form term's
>> plural and feminine forms.
>> There should be a function that takes in a text and a language code and
>> be able to compile a list of terms it contains.
>> There should be a function that takes in a text, a language and a target
>> language. It will return an exact translation for controlled terms and an
>> approximate translation of the rest using Google Translate.
>>
>> As for Semantic Web processing: For any text+language:
>> There should be a function that returns the greater common term: the term
>> that contains all the other mentioned terms.
>> There should be a function that returns the smallest distinctors (an
>> invented idea): the terms that are the most detailed (the leaves in the
>> thesaurus graph)
>>
>> Thesauri data and all these functions should be available in R (in the
>> subject.ro package).
>>
>>
>> We need scientific guidance on where this technology should lead and what
>> usecases can be derived. Please feedback.
>>
>> Christian Tzurcanu, subject.ro
>>
>>
>>
>> _______________________________________________
>> open-access mailing list
>> open-access at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/open-access
>> Unsubscribe: https://lists.okfn.org/mailman/options/open-access
>>
>>
>>
>>
>> --
>> Peter Murray-Rust
>> Reader in Molecular Informatics
>> Unilever Centre, Dep. Of Chemistry
>> University of Cambridge
>> CB2 1EW, UK
>> +44-1223-763069
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/open-science
>> Unsubscribe: https://lists.okfn.org/mailman/options/open-science
>>
>>
>
>
> --
> Ted Strauss
> Co-founder of Trudat.co <http://trudat.co/>
>



-- 
Ted Strauss
Co-founder of Trudat.co <http://trudat.co/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/pipermail/open-science/attachments/20131223/84750422/attachment.html>


More information about the open-science mailing list