[ckan-discuss] CKAN for chemistry

Mon Feb 21 13:29:35 GMT 2011

On Mon, Feb 21, 2011 at 2:10 PM, Jonathan Gray <jonathan.gray at okfn.org> wrote:
> Egon: would be great to have any further input from you on what
> changes you'd suggest on the basis of your experiences as a user!

I would first like to get some consensus on how CKAN should be used. A
clear definition or description of what should go into one record
would be important at this moment.

- how should mere dataset aggregations be handled? (Bio2RDF versus ChEMBL)
- how should alternative APIs be handled? (ChEMBL @ EBI versus a
SPARQL end point)

The problem lies in that a record has a single maintainer, single
license, single version associated, making it quite like a "data set
instance". Then again, with so many SPARQL end points currently being
separate from the upstream or original dataset, this does require some
means of linking things together.

catalogRecord:X :derivedFrom catalogRecordY

At the same time, and think I have seen this used, the SPARQL end
point might be a mere URL along with the 'original' dataset.

It seems to me that people have been using custom fields, just to make
the data somewhat consistent.

It would be good that it is more clear how the catalog should be
filled, and datasets properly annotated, before we start that
LODD/IsItOpenData hacking session in March...

Egon

PS. Simple that is very simple to fix, is to add this combined license:

"Creative Commons Attribution Share-Alike"

which is used by, for example, ChEMBL.

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers