[ckan4rdm] Contents of ckan4rdm digest...

Markus BUCHHORN markus.buchhorn at rdsi.uq.edu.au
Thu Feb 13 23:45:18 UTC 2014

Hi Ant, all

This is something that would have a lot of appeal in many circumstances - most folks in Australia at the moment see CKAN as a nice 'simple' repository framework, but not 'up to it' for heavy-duty RDM. The metadata aspect is one of the main reasons for that. The existing standard extraction is a nice start, but if you and Joe are starting to extend it, that will be great for the increased adoption of CKAN.

In terms of generifying it :) there are multiple 'global file format registries' (of course there are multiple, who needs just one!?) which include some basic level of information about how embedded metadata is stored. There are also schema registries for disciplines. Neither are complete, neither talk about mapping metadata fields stored in files to schema value pairs - except in some more organised disciplines, as Hannes noted. So tackling a few low-hanging fruits (such as astronomy and FITS, and biosciences and their gene-sequences, etc) would help to demonstrate the value of 'doing the right thing' when collecting the data. That's when a lot of the (scientific+provenance) metadata is generated. That in turn would make it easier for more extraction to be automated (I can dream :) ).

I don't know enough about the internals of CKAN to work on this aspect, but I'd be keen to map the community requirements into such an effort, and maybe help identify early targets and usability aspects, and then communicate it back out to the repository folks in Australia with the help of colleagues.

Markus (former astronomer, former data manager, ...)

From: ckan4rdm [mailto:ckan4rdm-bounces at lists.okfn.org] On Behalf Of Felix Engel
Sent: Friday, 14 February 2014 3:45 AM
To: ckan4rdm at lists.okfn.org
Subject: Re: [ckan4rdm] Contents of ckan4rdm digest...

Hello Ant,

our department is about to adapt CKAN for managing our research data, and metadata seems to be the main obstacle. We are considering to write a similar feature as you have developed. So, yes, we would also be interested to look at your code. It would be great if we could get this turned into a standard extension.

Thanks a lot for sharing.

Best wishes,
Am 13.02.2014 15:07, schrieb Ant Beck:
:-) Pleasure. I need to speak to the developer (I now work for a different company) so this may take time.

If it seems that things have gone quiet then 'virtually' kick me and I will sort it out.


On 13/02/14 14:05, Stefan Oderbolz wrote:

Hi Ant,

I would be very interested in your approach of solving this "problem". So if you could share your code, this would be really helpful. I'm involved in a project that is just about to decide how to extract metadata for very many datasets in various formats and source systems.

Maybe you can make put it under a friendly Open Source license on GitHub or so.

Thanks in advance!

Am 13.02.2014 13:11 schrieb "Ant Beck" <ant.beck at gmail.com<mailto:ant.beck at gmail.com>>:
Hi All,

We (dartportal.leeds.ac.uk<http://dartportal.leeds.ac.uk>) created a structured automated metadata creation system to handle our data ingest of thousands of stuff (different formats, datasets and sensors). There is definitely a need for this, especialy for those who may want to bulk upload archives.

Our code is not generic - I'm happy to share it if you would like to generify it, or if it helps in the general thrust. I'm in the process of writing a blog post on our experiences for OKF archaeology subsection


On 13/02/14 12:00, ckan4rdm-request at lists.okfn.org<mailto:ckan4rdm-request at lists.okfn.org> wrote:

Message: 1
Date: Wed, 12 Feb 2014 13:46:36 +0100
From: Hannes Thiemann <thiemann at dkrz.de<mailto:thiemann at dkrz.de>>
To: ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>
Subject: Re: [ckan4rdm] automatic metadata extraction
Message-ID: <52FB6D2C.7020300 at dkrz.de<mailto:52FB6D2C.7020300 at dkrz.de>>
Content-Type: text/plain; charset=ISO-8859-1

Dear Joe,

I believe such a thing could be pretty useful for any domain specific
repository where the number of used formats is limited. Automatic
metadata generation is an important added value as the ease by which the
data can be found is greatly improved.

Best, Hannes

Am 08.02.2014 04:22, schrieb Joe Tsoi:

I'm one of the CKAN core devs and I've created a ckan extension that
is a bit of a toy example. I hope that it might interest some people
on this list. The ckan extension is for FITS images generally used in
astronomy. When a fits file is uploaded to ckan, it automatically
parses the file and extracts the metadata and saves it against the
ckan resource, it also generates a greyscale jpeg image of the fits
file which serves as the image preview when the resource is previewed
in ckan.

I've setup a demo of it at http://astro-joet.rhcloud.com/dataset and
I've uploaded a couple of sample images taken from the Hubble fits
sample images. The source code for the extension is available at
https://github.com/joetsoi/ckanext-astro . I've actually added a
custom extension point to ckan which this extensions uses that isn't
currently available in vanilla ckan, but I hope to get some form of it
into ckan core. It's just a demo, so I'm pretty sure if you upload a
non fits image it'll break, and it's pretty hacky and is probably

Anyway I'd like to hear if this sort of automatic metadata extraction
is of any use to anyone, not just for astronomy, but for any type of
file. I'm not an astronomer or anything,  I just thought it might
serve as a good example.

ckan4rdm mailing list
ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>

ckan4rdm mailing list
ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>


ckan4rdm mailing list

ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>



Lic. phil. Felix Engel

PhD candidate

Biological Anthropology - Faculty of Medicine

Albert-Ludwigs-University Freiburg

Hebelstr. 29

79104 Freiburg (Breisgau)


phone: +49 / 761 / 203 - 5526

FAX: +49 / 761 / 203 - 6898
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan4rdm/attachments/20140213/926e69f7/attachment-0003.html>

More information about the ckan4rdm mailing list