[ckan-discuss] CKAN feature roadmap. Support VOID files and SPARQL service description

Mark Wainwright mark.wainwright at okfn.org
Tue Jul 16 11:04:12 BST 2013


> A secondary benefit could be that CKAN records could be expressed as RDF
> making them more reusable as well.

Just to point out that CKAN records are already exposed as RDF - see
http://datahub.io/uniprot.rdf or try

wget --header "Accept:application/rdf+xml" http://datahub.io/dataset/uniprot

Tim's tool also looks useful. Perhaps it would be a good idea to start
a page on the CKAN wiki <https://github.com/okfn/ckan/wiki> on tips
for using CKAN with RDF / linked data?

Mark

On 10/07/2013, Timothy Lebo <lebot at rpi.edu> wrote:
> Jerven,
>
> On Jul 10, 2013, at 9:39 AM, Jerven Bolleman <me at jerven.eu> wrote:
>
>> Hi Tim,
>>
>> This shows that not much work is needed for CKAN to support VoID files as
>> a source for dataset descriptors.
>
> I'm glad it helps point the way.
>
>> However, to be honest it does not solve my perception of what I think is
>> the problem in the CKAN/datahub.io approach.
>> And that is that a push model from providers is not sustainable on the
>> long run.
>> What is needed instead is an information pull model i.e. crawling for
>> dataset descriptors on the web and
>> regular updates.
>
>
> I am certain that the code I point to embraces your pull model. As I said, I
> do *one* HTTP dereference to your data site to obtain a single VoID
> representation, then I walk it and stuff it into CKAN.
> That's pull. And anybody can do it, not necessarily the original data
> provider. So, a CKAN instance could do it as part of it's "harvesting"
> (which, I must admit I'm have no direct experience with, so perhaps a CKANer
> can fill in some details there).
>
>>
>> I currently need to bend my infrastructure and spend time for CKAN
>> registration for no real benefit to me.
>
> The only work that you should do with my proposal is "get the VoID
> description right", which I would hope should be within your interests
> anyway.
>
>
>> And I need to do this for all other dataset aggregators.
>
> Affirm that they, too, should be aggregating based on your VoID, as you
> mention.
>
>> This does not scale, which is why the UniProt ckan
>> data is badly out of date and not structured the way I would like it.
>
> Perhaps I can invoke my add-metadata.py script against your void, and we can
> see how it goes?
>
>>
>> To solve this problem in the long run CKAN needs to started pulling data
>> from provided structured sources
>
> yes, as VoID :-)
>
>> instead of
>> making me and everyone else push information into CKAN all the time.
>
> Agreed. Just publish your VoID and let anyone run my script.
>
> Best,
> Tim
>
>>
>> Regards,
>> Jerven
>>
>>
>>
>> On Jul 5, 2013, at 3:53 PM, Timothy Lebo wrote:
>>
>>> Jerven,
>>>
>>> I have a python (optional SADI-based) script [1] that will walk a "good"
>>> VoID file and lower the descriptions into the CKAN representation.
>>> I call this nightly with cron and feed it the VoID that resolves from my
>>> data site's /void URI.
>>>
>>> e.g., you can see the daily updates as more vocabularies are used and an
>>> example URI is added:
>>> http://datahub.io/dataset/history/ichoose
>>>
>>> Datasets in the http://datahub.io/group/prizms group do this based on my
>>> Prizms linked data integration and publication platform [2].
>>>
>>> HTH.
>>>
>>> Regards,
>>> Tim Lebo
>>>
>>>
>>> [1]
>>> https://github.com/timrdf/DataFAQs/blob/master/services/sadi/ckan/add-metadata.py
>>> [2] https://github.com/timrdf/prizms/wiki
>>>
>>>
>>>
>>>
>>> On Jul 4, 2013, at 10:03 AM, Mark Wainwright <mark.wainwright at okfn.org>
>>> wrote:
>>>
>>>> This is interesting, though I'm not sure how it would work in
>>>> practice. E.g. would it be sufficient for you to have a tool you could
>>>> run (by invoking something like "voidckanupdate void.rdf
>>>> http://datahub.io") which automatically extracted the information you
>>>> wanted to record from the VoID file, and updated the Datahub via the
>>>> API?
>>>>
>>>> Mark
>>>>
>>>>
>>>> On 04/07/2013, Jerven Bolleman <me at jerven.eu> wrote:
>>>>> The number of triples. number of links to other datasets, last update
>>>>> etc...
>>>>>
>>>>> Mainly we need one point for maintaining this kind of data that is
>>>>> pulled.
>>>>> Instead of the current approach of
>>>>> visit datahub.io make changes manually
>>>>> visit identifiers.org make changes manually
>>>>> visit biodbcore make changes manually
>>>>> etc...
>>>>>
>>>>> i.e. currently as a large data provider we need to visit quite a lot
>>>>> of
>>>>> this kind of site to fill in and maintain all dataset meta data.
>>>>> This is not sustainable which is why I am happy that the other sites
>>>>> are
>>>>> looking into parsing VoID files.
>>>>> I hope that the datahub.io can do so as well.
>>>>>
>>>>> Regards,
>>>>> Jerven
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 4, 2013 at 3:32 PM, Mark Wainwright
>>>>> <mark.wainwright at okfn.org>wrote:
>>>>>
>>>>>> Hmm, I guess the common use case is for metadata that doesn't change
>>>>>> every month (address, type, description, licence, etc). What is it
>>>>>> you're updating monthly? What specific functionality on the Datahub
>>>>>> are you suggesting?
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> On 04/07/2013, Jerven Bolleman <me at jerven.eu> wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> This is a desired feature to remove manual overhead of maintaining
>>>>>>> the
>>>>>> same
>>>>>>> dataset information in many different databases of databases.
>>>>>>>
>>>>>>> For example the UniProt sparql endpoint has meta data in its service
>>>>>>> description. That you can retrieve here
>>>>>>>
>>>>>>> wget --header="Accept:application/rdf+xml"
>>>>>>> "http://beta.sparql.uniprot.org/"
>>>>>>> (Expect major improvements to this output in the coming months)
>>>>>>>
>>>>>>> Or the attached void file.
>>>>>>>
>>>>>>> Instead of us updating all this information manually everymonth we
>>>>>>> would
>>>>>>> rather generate a single void file. That other tools and list than
>>>>>> datahub
>>>>>>> could use as well.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Jerven
>>>>>>>
>>>>>>> PS. now with gzipped void file.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jerven Bolleman
>>>>>>> me at jerven.eu
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Business development and user engagement manager
>>>>>> The Open Knowledge Foundation
>>>>>> Empowering through Open Knowledge
>>>>>> http://okfn.org/  |  @okfn  |  http://ckan.org  |  @CKANproject
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jerven Bolleman
>>>>> me at jerven.eu
>>>>>
>>>>
>>>>
>>>> --
>>>> Business development and user engagement manager
>>>> The Open Knowledge Foundation
>>>> Empowering through Open Knowledge
>>>> http://okfn.org/  |  @okfn  |  http://ckan.org  |  @CKANproject
>>>>
>>>> _______________________________________________
>>>> ckan-discuss mailing list
>>>> ckan-discuss at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>>> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-discuss
>>>>
>>>
>>
>>
>
>


-- 
Business development and user engagement manager
The Open Knowledge Foundation
Empowering through Open Knowledge
http://okfn.org/  |  @okfn  |  http://ckan.org  |  @CKANproject



More information about the ckan-discuss mailing list