[ckan-discuss] Publishing to CKAN the Southampton Way

Wed Mar 30 18:18:32 BST 2011

Oh, I'd be happy to look into making our system CKAN-harvest-friendly if 
someone is willing to advise, but I'm plenty busy, so no hurry.

On 30/03/11 18:12, Christopher Gutteridge wrote:
> The API worked OK. It was a bit annoying not to be able to use the 
> standard URI for the license, but easily mapped to your ID.
>
> I think the technique is eminently repeatable, but a pull would make 
> it more simple.
>
> What we have found is that our initial dataset entries may require an 
> overhaul. Some things are needlessly divided up. Which is annoying 
> because we could get mileage out of the number of datasets.
>
> I'm also considering what recommendations we should make to JISC/JANET 
> about a .ac.uk catalogue, which may make sense to be a pull.
>
>
>
> On 30/03/11 14:48, Rufus Pollock wrote:
>> On 10 March 2011 15:16, Christopher Gutteridge<cjg at ecs.soton.ac.uk>  
>> wrote:
>>> Rufus has asked me to drop into the list to share what we've done to
>>> integrate data.southampton.ac.uk into CKAN.
>> I think this is really exciting Chris and thanks for posting to the 
>> list.
>>
>>> Our datasets have URI of the form
>>> http://id.southampton.ac.uk/dataset/places
>>> and HTML pages
>>> http://data.southampton.ac.uk/dataset/places.html
>>> they also have the, now traditional range of linked data API options 
>>> that
>>> aren't quite working yet for me.
>>> http://data.southampton.ac.uk/dataset/places.rdf
>>> http://data.southampton.ac.uk/dataset/places.xml
>>> you know the kind of thing.
>>>
>>> They also have a couple of unusual export views
>>> http://data.southampton.ac.uk/dataset/places.ckan.json
>>> http://data.southampton.ac.uk/dataset/places.boilerplate.rdf
>>>
>>> The first of these provides the JSON to inject into the CKAN API to 
>>> update
>>> or create the dataset, the second generates the triples which get 
>>> appended
>>> to the dataset each time it's published. (License etc.) The scripts 
>>> which do
>>> this are actually entirely different from the ones which do 
>>> places.html and
>>> there's .htaccess jiggerypokery, but the result seems more elegant 
>>> to my
>>> sysprog eyes.
>> Understood -- the point is you are always appending some standard set
>> of attributes (e.g. your standard license) when exporting to CKAN.
>>
>>> And of course; our scripts are available to all:
>>> https://github.com/cgutteridge/Grinder/tree/master/bin
>> Thanks a lot. Do you have any feedback (positive or negative) on the
>> API based on your experience using it?
>>
>> I'm also interested because it seems one could generalize this
>> approach for other people publishing data -- another approach is a
>> pull model in which CKAN harvests data from your end (given a
>> harvesting endpoint ...)
>>
>> Rufus
>

-- 
Christopher Gutteridge -- http://id.ecs.soton.ac.uk/person/1248

/ Lead Developer, EPrints Project, http://eprints.org/
/ Web Projects Manager, ECS, University of Southampton, http://www.ecs.soton.ac.uk/
/ Webmaster, Web Science Trust, http://www.webscience.org/