[ckan-discuss] CKAN feature roadmap. Support VOID files and SPARQL service description

Wed Jul 10 14:39:21 BST 2013

Hi Tim,

This shows that not much work is needed for CKAN to support VoID files as a source for dataset descriptors.
However, to be honest it does not solve my perception of what I think is the problem in the CKAN/datahub.io approach.
And that is that a push model from providers is not sustainable on the long run. 
What is needed instead is an information pull model i.e. crawling for dataset descriptors on the web and
regular updates.

I currently need to bend my infrastructure and spend time for CKAN registration for no real benefit to me.
And I need to do this for all other dataset aggregators. This does not scale, which is why the UniProt ckan 
data is badly out of date and not structured the way I would like it.

To solve this problem in the long run CKAN needs to started pulling data from provided structured sources instead of 
making me and everyone else push information into CKAN all the time.

Regards,
Jerven

On Jul 5, 2013, at 3:53 PM, Timothy Lebo wrote:

> Jerven,
> 
> I have a python (optional SADI-based) script [1] that will walk a "good" VoID file and lower the descriptions into the CKAN representation.
> I call this nightly with cron and feed it the VoID that resolves from my data site's /void URI.
> 
> e.g., you can see the daily updates as more vocabularies are used and an example URI is added:
> http://datahub.io/dataset/history/ichoose
> 
> Datasets in the http://datahub.io/group/prizms group do this based on my Prizms linked data integration and publication platform [2].
> 
> HTH.
> 
> Regards,
> Tim Lebo
> 
> 
> [1] https://github.com/timrdf/DataFAQs/blob/master/services/sadi/ckan/add-metadata.py
> [2] https://github.com/timrdf/prizms/wiki
> 
> 
> 
> 
> On Jul 4, 2013, at 10:03 AM, Mark Wainwright <mark.wainwright at okfn.org> wrote:
> 
>> This is interesting, though I'm not sure how it would work in
>> practice. E.g. would it be sufficient for you to have a tool you could
>> run (by invoking something like "voidckanupdate void.rdf
>> http://datahub.io") which automatically extracted the information you
>> wanted to record from the VoID file, and updated the Datahub via the
>> API?
>> 
>> Mark
>> 
>> 
>> On 04/07/2013, Jerven Bolleman <me at jerven.eu> wrote:
>>> The number of triples. number of links to other datasets, last update
>>> etc...
>>> 
>>> Mainly we need one point for maintaining this kind of data that is pulled.
>>> Instead of the current approach of
>>> visit datahub.io make changes manually
>>> visit identifiers.org make changes manually
>>> visit biodbcore make changes manually
>>> etc...
>>> 
>>> i.e. currently as a large data provider we need to visit quite a lot of
>>> this kind of site to fill in and maintain all dataset meta data.
>>> This is not sustainable which is why I am happy that the other sites are
>>> looking into parsing VoID files.
>>> I hope that the datahub.io can do so as well.
>>> 
>>> Regards,
>>> Jerven
>>> 
>>> 
>>> 
>>> On Thu, Jul 4, 2013 at 3:32 PM, Mark Wainwright
>>> <mark.wainwright at okfn.org>wrote:
>>> 
>>>> Hmm, I guess the common use case is for metadata that doesn't change
>>>> every month (address, type, description, licence, etc). What is it
>>>> you're updating monthly? What specific functionality on the Datahub
>>>> are you suggesting?
>>>> 
>>>> Mark
>>>> 
>>>> On 04/07/2013, Jerven Bolleman <me at jerven.eu> wrote:
>>>>> Hi All,
>>>>> 
>>>>> This is a desired feature to remove manual overhead of maintaining the
>>>> same
>>>>> dataset information in many different databases of databases.
>>>>> 
>>>>> For example the UniProt sparql endpoint has meta data in its service
>>>>> description. That you can retrieve here
>>>>> 
>>>>> wget --header="Accept:application/rdf+xml"
>>>>> "http://beta.sparql.uniprot.org/"
>>>>> (Expect major improvements to this output in the coming months)
>>>>> 
>>>>> Or the attached void file.
>>>>> 
>>>>> Instead of us updating all this information manually everymonth we
>>>>> would
>>>>> rather generate a single void file. That other tools and list than
>>>> datahub
>>>>> could use as well.
>>>>> 
>>>>> Regards,
>>>>> Jerven
>>>>> 
>>>>> PS. now with gzipped void file.
>>>>> 
>>>>> 
>>>>> --
>>>>> Jerven Bolleman
>>>>> me at jerven.eu
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Business development and user engagement manager
>>>> The Open Knowledge Foundation
>>>> Empowering through Open Knowledge
>>>> http://okfn.org/  |  @okfn  |  http://ckan.org  |  @CKANproject
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Jerven Bolleman
>>> me at jerven.eu
>>> 
>> 
>> 
>> -- 
>> Business development and user engagement manager
>> The Open Knowledge Foundation
>> Empowering through Open Knowledge
>> http://okfn.org/  |  @okfn  |  http://ckan.org  |  @CKANproject
>> 
>> _______________________________________________
>> ckan-discuss mailing list
>> ckan-discuss at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-discuss
>> 
>