[datahub-discuss] Using Datahub for scientific data and metadata

Ross Jones ross at servercode.co.uk
Fri Dec 6 11:35:43 UTC 2013


Hi Peter,

The best Java client (although it is actually for an old CKAN version now, but still works) is Tim Lebo’s version at https://github.com/timrdf/CKANClient-J which is a fork of a fork - we were talking about merging changes back into https://github.com/okfn/CKANClient-J but I’m not sure they happened.

I don’t think uploading that amount of data would be problematic. If you are uploading the XML as resources directly into datahub.io we’d need to check that the CKANClient-J supports upload correctly.

Ross.


On 6 Dec 2013, at 10:28, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:

> [I joined the datahub-discuss list yesterday and posted the following, but apparently it didn't get through so apologies for repost. I'd appreciate confirmation it has got through to the list. ]
> 
> I have become excited about the possibility of using the Datahub for repositing Open scientific information [1] and have started to proptotype my application. 
> 
> Simply, I am going to extract facts from the scientific literature and store them in Datahub. Some facts will be name-value (String) pairs (e.g. species), others will be structured as XML blobs (e.g. molecules). 
> 
> I intend to search each publisher daily (using cron) and index about 150 papers into metadata and XML. I don't think the initial byte sizes will cripple the Datahub but I will back off if it does. In the first instance the daily trawl of 150 papers will generate about 100 Kb of XML and 1000 metadata tags (name-value strings) per day.
> 
> I use Java and have the following questions:
> * Is the Java API still current for CKAN/Datahub (I think Ross J wrote it and have copied him)?
> * are there any known issues in what I propose (uploading 150 * 0.5 Kbyte files /day on an automatic basis)?
> 
> Mark Wainwright and I had an initial problem where resetting the metadata caused the data to be deleted - slightly embarrasing since a reporter from Nature was looking into the repository and couldn't find anything in. Can the repo be reset if anything goes wrong?
> 
> Hope this makes sense and thanks
> 
> 
> P
> If you are interested in background, read http://blogs.ch.cam.ac.uk/pmr, https://vimeo.com/78353557 (5 mins) and http://www.slideshare.net/petermurrayrust/the-content-mine-presented-at-uksg (slides). 
> 
> [1] Rufus has suggested this for the last 10 years...and reality and vision have coalesced.
> 
> 
> -- 
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/datahub-discuss/attachments/20131206/cc832f8a/attachment-0003.html>


More information about the datahub-discuss mailing list