[ckan-discuss] CKAN feature roadmap. Support VOID files and SPARQL service description
Ross Jones
ross at servercode.co.uk
Wed Jul 10 15:03:35 BST 2013
Jerven,
Might this be something that could implemented as an extension to the harvester at https://github.com/okfn/ckanext-harvest?
Ross
On 10 Jul 2013, at 14:39, Jerven Bolleman <me at jerven.eu> wrote:
> Hi Tim,
>
> This shows that not much work is needed for CKAN to support VoID files as a source for dataset descriptors.
> However, to be honest it does not solve my perception of what I think is the problem in the CKAN/datahub.io approach.
> And that is that a push model from providers is not sustainable on the long run.
> What is needed instead is an information pull model i.e. crawling for dataset descriptors on the web and
> regular updates.
>
> I currently need to bend my infrastructure and spend time for CKAN registration for no real benefit to me.
> And I need to do this for all other dataset aggregators. This does not scale, which is why the UniProt ckan
> data is badly out of date and not structured the way I would like it.
>
> To solve this problem in the long run CKAN needs to started pulling data from provided structured sources instead of
> making me and everyone else push information into CKAN all the time.
>
> Regards,
> Jerven
>
>
>
> On Jul 5, 2013, at 3:53 PM, Timothy Lebo wrote:
>
>> Jerven,
>>
>> I have a python (optional SADI-based) script [1] that will walk a "good" VoID file and lower the descriptions into the CKAN representation.
>> I call this nightly with cron and feed it the VoID that resolves from my data site's /void URI.
>>
>> e.g., you can see the daily updates as more vocabularies are used and an example URI is added:
>> http://datahub.io/dataset/history/ichoose
>>
>> Datasets in the http://datahub.io/group/prizms group do this based on my Prizms linked data integration and publication platform [2].
>>
>> HTH.
>>
>> Regards,
>> Tim Lebo
>>
>>
>> [1] https://github.com/timrdf/DataFAQs/blob/master/services/sadi/ckan/add-metadata.py
>> [2] https://github.com/timrdf/prizms/wiki
>>
>>
>>
>>
>> On Jul 4, 2013, at 10:03 AM, Mark Wainwright <mark.wainwright at okfn.org> wrote:
>>
>>> This is interesting, though I'm not sure how it would work in
>>> practice. E.g. would it be sufficient for you to have a tool you could
>>> run (by invoking something like "voidckanupdate void.rdf
>>> http://datahub.io") which automatically extracted the information you
>>> wanted to record from the VoID file, and updated the Datahub via the
>>> API?
>>>
>>> Mark
>>>
>>>
>>> On 04/07/2013, Jerven Bolleman <me at jerven.eu> wrote:
>>>> The number of triples. number of links to other datasets, last update
>>>> etc...
>>>>
>>>> Mainly we need one point for maintaining this kind of data that is pulled.
>>>> Instead of the current approach of
>>>> visit datahub.io make changes manually
>>>> visit identifiers.org make changes manually
>>>> visit biodbcore make changes manually
>>>> etc...
>>>>
>>>> i.e. currently as a large data provider we need to visit quite a lot of
>>>> this kind of site to fill in and maintain all dataset meta data.
>>>> This is not sustainable which is why I am happy that the other sites are
>>>> looking into parsing VoID files.
>>>> I hope that the datahub.io can do so as well.
>>>>
>>>> Regards,
>>>> Jerven
>>>>
>>>>
>>>>
>>>> On Thu, Jul 4, 2013 at 3:32 PM, Mark Wainwright
>>>> <mark.wainwright at okfn.org>wrote:
>>>>
>>>>> Hmm, I guess the common use case is for metadata that doesn't change
>>>>> every month (address, type, description, licence, etc). What is it
>>>>> you're updating monthly? What specific functionality on the Datahub
>>>>> are you suggesting?
>>>>>
>>>>> Mark
>>>>>
>>>>> On 04/07/2013, Jerven Bolleman <me at jerven.eu> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> This is a desired feature to remove manual overhead of maintaining the
>>>>> same
>>>>>> dataset information in many different databases of databases.
>>>>>>
>>>>>> For example the UniProt sparql endpoint has meta data in its service
>>>>>> description. That you can retrieve here
>>>>>>
>>>>>> wget --header="Accept:application/rdf+xml"
>>>>>> "http://beta.sparql.uniprot.org/"
>>>>>> (Expect major improvements to this output in the coming months)
>>>>>>
>>>>>> Or the attached void file.
>>>>>>
>>>>>> Instead of us updating all this information manually everymonth we
>>>>>> would
>>>>>> rather generate a single void file. That other tools and list than
>>>>> datahub
>>>>>> could use as well.
>>>>>>
>>>>>> Regards,
>>>>>> Jerven
>>>>>>
>>>>>> PS. now with gzipped void file.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jerven Bolleman
>>>>>> me at jerven.eu
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Business development and user engagement manager
>>>>> The Open Knowledge Foundation
>>>>> Empowering through Open Knowledge
>>>>> http://okfn.org/ | @okfn | http://ckan.org | @CKANproject
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jerven Bolleman
>>>> me at jerven.eu
>>>>
>>>
>>>
>>> --
>>> Business development and user engagement manager
>>> The Open Knowledge Foundation
>>> Empowering through Open Knowledge
>>> http://okfn.org/ | @okfn | http://ckan.org | @CKANproject
>>>
>>> _______________________________________________
>>> ckan-discuss mailing list
>>> ckan-discuss at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-discuss
>>>
>>
>
>
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20130710/843c3afe/attachment.htm>
More information about the ckan-discuss
mailing list