[ckan-dev] Future, flask, breaking things, funding.

Denis Zgonjanin deniszgonjanin at gmail.com
Mon Sep 14 14:33:38 UTC 2015


Yes, we should think of use cases. Realtime data is just one. I'm not just
talking about things we might want to do. Here are the current things in
CKAN that would benefit from better asynchronous support:

- Datastore & Datapusher. We could integrate datapusher into CKAN, so
people don't need to set up an additional web service just to use stock
CKAN.
- Harvesting. Set up a periodic callback that calls harvest sources every
hour. Super easy when compared to having to set up reddit/ZeroMQ, and
another 3(!) long-running processes running in the background.
- Webhooks. They must be pushed off to a celery queue because of Pylons.
With async they could be fired off easily.
- Analytics & analytics reports; Sending automated emails and other
automated tasks.
- Anything where right now we have to set up cron jobs.

And probably most importantly - CKAN is going to need a face lift
eventually if it's to remain relevant. It can't be stuck in CRUD land
forever. There is plenty of time for this, no rush. But building cool
shinny new things with fancy front-end javascript would be hard right now.
It will be hard on any web framework built on the idea that your whole
application context is transferred to the user on every HTTP request, and
that nothing else except that is going on in the backend.


On Mon, Sep 14, 2015 at 9:34 AM, Stéphane Guidoin <
stephane.guidoin at gmail.com> wrote:

> *Now that government is (slowly) catching on, more stream, API, and even
> real-time data is being published. CKAN doesn't do a great job here. The
> biggest obstacle to creating nice extensions to CKAN for non-file data is
> that Pylons is still firmly stuck within the HTTP request-response
> lifecycle. *
>
> I wonder what should be the role of CKAN when it comes to APIs, streams
> and other things. Those stuff tend to be fairly resource intensive and most
> of the time, they are developed and hosted on their own, not on the open
> data portal. So what should be the role of CKAN on this? How much do we
> want to be able to integrate CKAN with APIs and streams, what should it
> give?
>
> From my point of view, moving to Flask or other, framework is mostly a
> question of technical debt (
> https://18f.gsa.gov/2015/08/07/technical-debt-1/) and making sure CKAN
> remains flexible (and build-in async would indeed help)
>
> When it comes to see how to support realtime data, even if it's to mainly
> enable extension development, some thinking about use case is needed in
> order to avoid jumping into something that would be very time intensive in
> terms of dev.
>
> Stéphane
>
>
>
> On 2015-09-14 08:57, Denis Zgonjanin wrote:
>
> Right now CKAN is great for static sources of data, which is really all
> that existed from government sources when CKAN was first written.
>
> Now that government is (slowly) catching on, more stream, API, and even
> real-time data is being published. CKAN doesn't do a great job here. The
> biggest obstacle to creating nice extensions to CKAN for non-file data is
> that Pylons is still firmly stuck within the HTTP request-response
> lifecycle.
>
> This worked well for CRUD apps, but now is really showing it's
> limitations. It's hard to do anything in CKAN that doesn't take place
> within the context of a user's HTTP request. If you want to do some extra
> data processing on the side, you have to use celery queues or worse, cron.
> Worse yet, some people do try to put extra processing inside the
> request-response lifecycle, causing problems.
>
> Even core CKAN is guilty of this. For example, CKAN will call datapusher
> to send upload jobs and retrieve job results, and those requests to
> datapusher happen while the user is waiting for the request to return. This
> is kind of terrible. Not even because somebody did it this way, but because
> CKAN doesn't give you a sane alternative to do it properly.
>
> Porting CKAN to flask is no small feat, so let's make sure we do it right.
> Now that we're not using CKAN to just host static files anymore, we need to
> have better, built-in async support in CKAN. Perhaps this means moving to
> Python 3 where we'll have asyncio (and hopefully a future version of flask
> will work well with it). Other frameworks, like tornado, are also quite
> lightweight and support this out of the box for python 2.x.
>
> - Denis
>
>
> On Mon, Sep 14, 2015 at 3:56 AM, Angelos Tzotsos <gcpp.kalxas at gmail.com>
> wrote:
>
>> On 09/14/2015 10:24 AM, Ross Jones wrote:
>>
>>> Hi,
>>>
>>> I’ve recently been playing about with implementing parts of CKAN in
>>> Flask side-by-side with the current Pylons implementation. I’m doing it
>>> like this so that it isn’t immediately obvious that there’s a migration
>>> happening towards using Flask (aka nothing breaks).  I don’t think this
>>> branch should ever be merged, it’s more exploratory but it has raised some
>>> questions that I think it would be good to discuss.
>>>
>>> WARNING:anecdata
>>> It’s pretty clear that the vast majority of people asked would like to
>>> move to Flask as a replacement for some layers of the system (leaving
>>> things like logic and plugins alone).
>>> ENDWARNING
>>>
>>> We’ve discussed at the tech-team meetings, but I think a longer, more
>>> accessible conversation would be beneficial.
>>>
>>> 1. What version of CKAN should be targeted? Common sense suggests 3.0,
>>> but that being the case, exactly how far can we go in breaking some
>>> backward compatibility?  This isn’t really a technical question - would be
>>> good to hear what the community would accept …
>>>
>>> 2. Does it *really* need to be side-by-side?  Running Flask and Pylons
>>> side-by-side means staying on Python 2 for another few years (because
>>> Pylons).  A reasonably deep incision and removal of non-logic/non-plugin
>>> code would make a move to Py3 easier, but with some level of breakage in
>>> external plugins. Staying on 2 would mean a move to 3 at a later date and
>>> more pain.
>>>
>>> 3. Would the CKAN Association like to fund someone to do some of this
>>> work? This is just one of several ideas mentioned on
>>> https://github.com/ckan/ideas-and-roadmap/issues/152 that really needs
>>> to be done if CKAN is going to thrive instead of just survive.
>>>
>>> Any feedback welcome…
>>>
>>> Cheers
>>>
>>> Ross.
>>>
>>>
>>>
>>> _______________________________________________
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>
>>
>> Hi Ross,
>>
>> I believe that a Flask port (or rewrite) is an excellent idea for CKAN
>> 3.0 in order to support Python 3.x
>> The alternative would be to port Pylons to Python 3.x, which perhaps is a
>> more difficult task...
>>
>> Given that Python 2.x will EOL relatively soon, CKAN should move forward.
>>
>> Just my 2 cents.
>>
>> Best,
>> Angelos
>>
>> --
>> Angelos Tzotsos, PhD
>> OSGeo Charter Member
>> http://users.ntua.gr/tzotsos
>>
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>
>
>
>
> _______________________________________________
> ckan-dev mailing listckan-dev at lists.okfn.orghttps://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150914/5f11c1ba/attachment-0003.html>


More information about the ckan-dev mailing list