[ckan-dev] import.io

Matthew Fullerton matt.fullerton at gmail.com
Thu Oct 13 09:40:48 UTC 2016


Of course :-)

If you look at https://github.com/okfde/ckanext-offenedaten and in
particular
https://github.com/okfde/ckanext-offenedaten/tree/master/ckanext/offenedaten/harvesters
you can see how we are triggering such scripts. Because of the background
of that project the actual scripts can be found here:
https://github.com/SebastianBerchtold/odm-catalogreaders

In particular
https://github.com/SebastianBerchtold/odm-catalogreaders/blob/master/odm/catalogs/portals/bochum.py
is scraping HTML and not just reading an API.

Best,
Matt

On 13 October 2016 at 11:30, Bois Francois-Xavier <fxbois at gmail.com> wrote:

> Matthew
>
> do you have an example of such a script ? ("... with it you can write
> scripts that grab data ...")
>
> Best
>
> Fx
>
> On Thu, Oct 13, 2016 at 11:22 AM, Matthew Fullerton <
> matt.fullerton at gmail.com> wrote:
>
>> Hi Oliver,
>> The basic extension to be aware of is https://github.com/ckan/cka
>> next-harvest - with it you can write scripts that grab data (e.g. a
>> scraper) and push it into CKAN and have it automatically take place on a
>> regular basis. But you don't want to write scripts, you want to use
>> import.io.
>>
>> I don't know of any connection between those two. Import.io is cool but a
>> very closed system. I recently submitted a small grant proposal to build an
>> extension to connect https://morph.io with CKAN. morph.io is free and
>> open source and gives you the ground work for the scraping as well as
>> taking care of the scheduling. All that is necessary is either an extension
>> to morph.io ("publish to CKAN") or a hook in CKAN (morph supports
>> triggering a URL every time a scrape completes) that pulls the latest data
>> from morph.io. The full text of the proposal is in German but that's the
>> gist of it. Its something I'd be very interested in working on even if it
>> doesn't get funded as I am taking care of this right now with Amazon Web
>> Services Lambda functions which all feels too manual and scattered.
>>
>> Back to import.io, maybe they also let you call some code or a URL after
>> a scrape? Writing new data into CKAN programmatically (either new datasets
>> or rows of data in an existing resource) is quite easy with CKAN.
>>
>> Best,
>> Matt
>>
>>
>>
>> On 12 October 2016 at 14:46, Oliver Standeven <os214 at kitc-solutions.co.uk
>> > wrote:
>>
>>> Hello all,
>>>
>>> I am a University student who has worked with CKAN at my previous place
>>> of employment (on my placement) and I have suggested CKAN as an option for
>>> one of the projects that I am now working on in my final year.
>>>
>>> The client wants to use import.io to scrape some information from
>>> websites to list them all in one place. I wondered if anybody has had
>>> experience in using import.io and if there is maybe some CKAN
>>> extensions that may be able to get me started with some proof of concept or
>>> would I have to do that manually myself? I think CKAN would be great for
>>> categorizing the data and making it openly available.
>>>
>>> Thanks in advance,
>>>
>>> Oliver
>>>
>>> _______________________________________________
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>
>>>
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>
>>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20161013/d1aeebf4/attachment-0003.html>


More information about the ckan-dev mailing list