[datahub-discuss] Using Datahub for scientific data and metadata

Peter Murray-Rust pm286 at cam.ac.uk
Fri Dec 6 19:06:08 UTC 2013


Our revised (mavenized, XML) version of the CKANClinet-J is at:
https://bitbucket.org/petermr/ckanclient.  A typical query of the species
porjects is

package org.xmlcml.ckan.GetTest.test_GetXMLDOIs()

We are happy to revert to previous or central packagenames and artifacts if
you advise us.

P




On Fri, Dec 6, 2013 at 6:09 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:

> Thanks Ross,
>
>
>
> On Fri, Dec 6, 2013 at 11:35 AM, Ross Jones <ross at servercode.co.uk> wrote:
>
>> Hi Peter,
>>
>> The best Java client (although it is actually for an old CKAN version
>> now, but still works) is Tim Lebo’s version at
>> https://github.com/timrdf/CKANClient-J which is a fork of a fork - we
>> were talking about merging changes back into
>> https://github.com/okfn/CKANClient-J but I’m not sure they happened.
>>
>
> Yes, we've got this working. We put it under Maven and would be happy to
> merge if that's a good thing to do. With some inspired guessing we found
> the XML under Resource.getUrl() and was able to download it and verify it.
>
> So that's great. The API looks nice. We gave it high marks except for the
> rather tacky use of mvn in a shell script rather than in a POM.
> Unfortunately I seem to have to get an APIKEY to Update/Create and I'm not
> sure where mine has gone and also who I am (email, login, etc.) .
>
>>
>> I don’t think uploading that amount of data would be problematic. If you
>> are uploading the XML as resources directly into datahub.io we’d need to
>> check that the CKANClient-J supports upload correctly.
>>
>>
> Good. Are you able to tell me who I am? peter.murr...etc.  or pm2...etc.?
>
> P.
>
>
>
>> Ross.
>>
>>
>> On 6 Dec 2013, at 10:28, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>>
>> [I joined the datahub-discuss list yesterday and posted the following,
>> but apparently it didn't get through so apologies for repost. I'd
>> appreciate confirmation it has got through to the list. ]
>>
>> I have become excited about the possibility of using the Datahub for
>> repositing Open scientific information [1] and have started to proptotype
>> my application.
>>
>> Simply, I am going to extract facts from the scientific literature and
>> store them in Datahub. Some facts will be name-value (String) pairs (e.g.
>> species), others will be structured as XML blobs (e.g. molecules).
>>
>> I intend to search each publisher daily (using cron) and index about 150
>> papers into metadata and XML. I don't think the initial byte sizes will
>> cripple the Datahub but I will back off if it does. In the first instance
>> the daily trawl of 150 papers will generate about 100 Kb of XML and 1000
>> metadata tags (name-value strings) per day.
>>
>> I use Java and have the following questions:
>> * Is the Java API still current for CKAN/Datahub (I think Ross J wrote it
>> and have copied him)?
>> * are there any known issues in what I propose (uploading 150 * 0.5 Kbyte
>> files /day on an automatic basis)?
>>
>> Mark Wainwright and I had an initial problem where resetting the metadata
>> caused the data to be deleted - slightly embarrasing since a reporter from
>> Nature was looking into the repository and couldn't find anything in. Can
>> the repo be reset if anything goes wrong?
>>
>> Hope this makes sense and thanks
>>
>>
>> P
>> If you are interested in background, read http://blogs.ch.cam.ac.uk/pmr,
>> https://vimeo.com/78353557 (5 mins) and
>> http://www.slideshare.net/petermurrayrust/the-content-mine-presented-at-uksg(slides).
>>
>> [1] Rufus has suggested this for the last 10 years...and reality and
>> vision have coalesced.
>>
>> --
>> Peter Murray-Rust
>> Reader in Molecular Informatics
>> Unilever Centre, Dep. Of Chemistry
>> University of Cambridge
>> CB2 1EW, UK
>> +44-1223-763069
>>
>>
>>
>> _______________________________________________
>> datahub-discuss mailing list
>> datahub-discuss at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/datahub-discuss
>>
>>
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/datahub-discuss/attachments/20131206/4a983805/attachment-0003.html>


More information about the datahub-discuss mailing list