[datahub-discuss] Using Datahub for scientific data and metadata

Peter Murray-Rust pm286 at cam.ac.uk
Fri Dec 6 18:09:14 UTC 2013


Thanks Ross,



On Fri, Dec 6, 2013 at 11:35 AM, Ross Jones <ross at servercode.co.uk> wrote:

> Hi Peter,
>
> The best Java client (although it is actually for an old CKAN version now,
> but still works) is Tim Lebo’s version at
> https://github.com/timrdf/CKANClient-J which is a fork of a fork - we
> were talking about merging changes back into
> https://github.com/okfn/CKANClient-J but I’m not sure they happened.
>

Yes, we've got this working. We put it under Maven and would be happy to
merge if that's a good thing to do. With some inspired guessing we found
the XML under Resource.getUrl() and was able to download it and verify it.

So that's great. The API looks nice. We gave it high marks except for the
rather tacky use of mvn in a shell script rather than in a POM.
Unfortunately I seem to have to get an APIKEY to Update/Create and I'm not
sure where mine has gone and also who I am (email, login, etc.) .

>
> I don’t think uploading that amount of data would be problematic. If you
> are uploading the XML as resources directly into datahub.io we’d need to
> check that the CKANClient-J supports upload correctly.
>
>
Good. Are you able to tell me who I am? peter.murr...etc.  or pm2...etc.?

P.



> Ross.
>
>
> On 6 Dec 2013, at 10:28, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>
> [I joined the datahub-discuss list yesterday and posted the following, but
> apparently it didn't get through so apologies for repost. I'd appreciate
> confirmation it has got through to the list. ]
>
> I have become excited about the possibility of using the Datahub for
> repositing Open scientific information [1] and have started to proptotype
> my application.
>
> Simply, I am going to extract facts from the scientific literature and
> store them in Datahub. Some facts will be name-value (String) pairs (e.g.
> species), others will be structured as XML blobs (e.g. molecules).
>
> I intend to search each publisher daily (using cron) and index about 150
> papers into metadata and XML. I don't think the initial byte sizes will
> cripple the Datahub but I will back off if it does. In the first instance
> the daily trawl of 150 papers will generate about 100 Kb of XML and 1000
> metadata tags (name-value strings) per day.
>
> I use Java and have the following questions:
> * Is the Java API still current for CKAN/Datahub (I think Ross J wrote it
> and have copied him)?
> * are there any known issues in what I propose (uploading 150 * 0.5 Kbyte
> files /day on an automatic basis)?
>
> Mark Wainwright and I had an initial problem where resetting the metadata
> caused the data to be deleted - slightly embarrasing since a reporter from
> Nature was looking into the repository and couldn't find anything in. Can
> the repo be reset if anything goes wrong?
>
> Hope this makes sense and thanks
>
>
> P
> If you are interested in background, read http://blogs.ch.cam.ac.uk/pmr,
> https://vimeo.com/78353557 (5 mins) and
> http://www.slideshare.net/petermurrayrust/the-content-mine-presented-at-uksg(slides).
>
> [1] Rufus has suggested this for the last 10 years...and reality and
> vision have coalesced.
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>
>
>
> _______________________________________________
> datahub-discuss mailing list
> datahub-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/datahub-discuss
>
>


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/datahub-discuss/attachments/20131206/396623b4/attachment-0003.html>


More information about the datahub-discuss mailing list