[ckan-dev] Future, flask, breaking things, funding.
Steven De Costa
steven.decosta at linkdigital.com.au
Mon Sep 14 22:04:23 UTC 2015
Looks interesting. Large scientific datasets are also something to handle
in a 3.0 implementation.
I wouldn't mind writing up the CKAN use case paradigm I've been presenting
lately, as that is what I see as the core of user value when you step aside
from the resulting technical value of a solid data management system.
But yeah, improving 'data pipelines' is the thing.
Cheers,
Steven
On Tuesday, September 15, 2015, Joel Natividad <joel.natividad at ontodia.com>
wrote:
> Hi all,
> What about integrating with Dat <http://dat-data.com>?
>
> It handles streaming data; can handle huge datasets; can do deltas (no
> need to re-download a huge dataset over and over again) ; has versions (not
> just revisions as data consumers have legitimate reasons to use different
> versions of data, down to the row level), and makes CKAN more "dog-fooding"
> friendly (i.e. publishers using it not only to publish data, but to
> actually build solutions ).
>
> Marianne Bellotti (CKAN-powered HDX) and I independently spent some
> quality time with Karissa McKelvey - one of the three key developers
> behind Dat <http://dat-data.com/team>, when she was in NYC last month and
> discussed at length how Dat + CKAN can work together.
>
> Karissa even put together a rough spec on a "ckanext-dat" extension.
>
> FYI, Dat is supported by usopendata.org
> <https://usopendata.org/2015/07/29/dat-beta/>, which also happens to be
> the org behind CKAN-Multisite, which was just announced as generally
> available today. <https://usopendata.org/2015/09/14/ckan-multisite/>
>
> Best,
> Joel
>
>
> --
> Joel Natividad
> +1 347-565-5635
> @jqnatividad
>
> Ontodia, Inc.
> 137 Varick Street, 2nd Floor, New York, NY 10013
>
> On Mon, Sep 14, 2015 at 5:11 PM, Steven De Costa <
> steven.decosta at linkdigital.com.au
> <javascript:_e(%7B%7D,'cvml','steven.decosta at linkdigital.com.au');>>
> wrote:
>
>> I'm 'all in' on this discussion :) I'll setup a doodle and we can pick a
>> time to do a video call...
>>
>> My 2c on some points.
>>
>> 1. Perhaps redev could be bottom up. Start with resources and widen its
>> ability. Crud can then be rebuilt over the top.
>> 2. Carefully consider the longest term possible and how the app may
>> mature in the future.
>> 3. Consider interoperability between n+1 platforms via linked open data,
>> again with realtime in mind
>> 4. Consider packages further. Could we add new package types that are
>> built on 3.0 thinking and have them co exist with current packages? If so
>> then existing extensions could be modified less dramatically to apply only
>> to v2 packages.
>> 5. Think about migration scenarios. Could a v2 CKAN remain as a dumb web
>> app harvesting from a 3.0? If so, we could priorities workflows around
>> custodians and ETL before end users.
>> 6. Yes I'm sure others in the steering group would support the work. Just
>> remember they are also just volunteers :)
>> 7. Yes I'm sure funding could come from the Association, just so long as
>> funding first goes into the association. So, we'd all have a part to play
>> in signing up paying members - happy to take any leads from people on that
>> point :)
>>
>> Hoots!
>>
>>
>> On Tuesday, September 15, 2015, Denis Zgonjanin <deniszgonjanin at gmail.com
>> <javascript:_e(%7B%7D,'cvml','deniszgonjanin at gmail.com');>> wrote:
>>
>>> Yes, we should think of use cases. Realtime data is just one. I'm not
>>> just talking about things we might want to do. Here are the current things
>>> in CKAN that would benefit from better asynchronous support:
>>>
>>> - Datastore & Datapusher. We could integrate datapusher into CKAN, so
>>> people don't need to set up an additional web service just to use stock
>>> CKAN.
>>> - Harvesting. Set up a periodic callback that calls harvest sources
>>> every hour. Super easy when compared to having to set up reddit/ZeroMQ, and
>>> another 3(!) long-running processes running in the background.
>>> - Webhooks. They must be pushed off to a celery queue because of Pylons.
>>> With async they could be fired off easily.
>>> - Analytics & analytics reports; Sending automated emails and other
>>> automated tasks.
>>> - Anything where right now we have to set up cron jobs.
>>>
>>> And probably most importantly - CKAN is going to need a face lift
>>> eventually if it's to remain relevant. It can't be stuck in CRUD land
>>> forever. There is plenty of time for this, no rush. But building cool
>>> shinny new things with fancy front-end javascript would be hard right now.
>>> It will be hard on any web framework built on the idea that your whole
>>> application context is transferred to the user on every HTTP request, and
>>> that nothing else except that is going on in the backend.
>>>
>>>
>>> On Mon, Sep 14, 2015 at 9:34 AM, Stéphane Guidoin <
>>> stephane.guidoin at gmail.com> wrote:
>>>
>>>> *Now that government is (slowly) catching on, more stream, API, and
>>>> even real-time data is being published. CKAN doesn't do a great job here.
>>>> The biggest obstacle to creating nice extensions to CKAN for non-file data
>>>> is that Pylons is still firmly stuck within the HTTP request-response
>>>> lifecycle. *
>>>>
>>>> I wonder what should be the role of CKAN when it comes to APIs, streams
>>>> and other things. Those stuff tend to be fairly resource intensive and most
>>>> of the time, they are developed and hosted on their own, not on the open
>>>> data portal. So what should be the role of CKAN on this? How much do we
>>>> want to be able to integrate CKAN with APIs and streams, what should it
>>>> give?
>>>>
>>>> From my point of view, moving to Flask or other, framework is mostly a
>>>> question of technical debt (
>>>> https://18f.gsa.gov/2015/08/07/technical-debt-1/) and making sure CKAN
>>>> remains flexible (and build-in async would indeed help)
>>>>
>>>> When it comes to see how to support realtime data, even if it's to
>>>> mainly enable extension development, some thinking about use case is needed
>>>> in order to avoid jumping into something that would be very time intensive
>>>> in terms of dev.
>>>>
>>>> Stéphane
>>>>
>>>>
>>>>
>>>> On 2015-09-14 08:57, Denis Zgonjanin wrote:
>>>>
>>>> Right now CKAN is great for static sources of data, which is really all
>>>> that existed from government sources when CKAN was first written.
>>>>
>>>> Now that government is (slowly) catching on, more stream, API, and even
>>>> real-time data is being published. CKAN doesn't do a great job here. The
>>>> biggest obstacle to creating nice extensions to CKAN for non-file data is
>>>> that Pylons is still firmly stuck within the HTTP request-response
>>>> lifecycle.
>>>>
>>>> This worked well for CRUD apps, but now is really showing it's
>>>> limitations. It's hard to do anything in CKAN that doesn't take place
>>>> within the context of a user's HTTP request. If you want to do some extra
>>>> data processing on the side, you have to use celery queues or worse, cron.
>>>> Worse yet, some people do try to put extra processing inside the
>>>> request-response lifecycle, causing problems.
>>>>
>>>> Even core CKAN is guilty of this. For example, CKAN will call
>>>> datapusher to send upload jobs and retrieve job results, and those requests
>>>> to datapusher happen while the user is waiting for the request to return.
>>>> This is kind of terrible. Not even because somebody did it this way, but
>>>> because CKAN doesn't give you a sane alternative to do it properly.
>>>>
>>>> Porting CKAN to flask is no small feat, so let's make sure we do it
>>>> right. Now that we're not using CKAN to just host static files anymore, we
>>>> need to have better, built-in async support in CKAN. Perhaps this means
>>>> moving to Python 3 where we'll have asyncio (and hopefully a future version
>>>> of flask will work well with it). Other frameworks, like tornado, are also
>>>> quite lightweight and support this out of the box for python 2.x.
>>>>
>>>> - Denis
>>>>
>>>>
>>>> On Mon, Sep 14, 2015 at 3:56 AM, Angelos Tzotsos <gcpp.kalxas at gmail.com
>>>> > wrote:
>>>>
>>>>> On 09/14/2015 10:24 AM, Ross Jones wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I’ve recently been playing about with implementing parts of CKAN in
>>>>>> Flask side-by-side with the current Pylons implementation. I’m doing it
>>>>>> like this so that it isn’t immediately obvious that there’s a migration
>>>>>> happening towards using Flask (aka nothing breaks). I don’t think this
>>>>>> branch should ever be merged, it’s more exploratory but it has raised some
>>>>>> questions that I think it would be good to discuss.
>>>>>>
>>>>>> WARNING:anecdata
>>>>>> It’s pretty clear that the vast majority of people asked would like
>>>>>> to move to Flask as a replacement for some layers of the system (leaving
>>>>>> things like logic and plugins alone).
>>>>>> ENDWARNING
>>>>>>
>>>>>> We’ve discussed at the tech-team meetings, but I think a longer, more
>>>>>> accessible conversation would be beneficial.
>>>>>>
>>>>>> 1. What version of CKAN should be targeted? Common sense suggests
>>>>>> 3.0, but that being the case, exactly how far can we go in breaking some
>>>>>> backward compatibility? This isn’t really a technical question - would be
>>>>>> good to hear what the community would accept …
>>>>>>
>>>>>> 2. Does it *really* need to be side-by-side? Running Flask and
>>>>>> Pylons side-by-side means staying on Python 2 for another few years
>>>>>> (because Pylons). A reasonably deep incision and removal of
>>>>>> non-logic/non-plugin code would make a move to Py3 easier, but with some
>>>>>> level of breakage in external plugins. Staying on 2 would mean a move to 3
>>>>>> at a later date and more pain.
>>>>>>
>>>>>> 3. Would the CKAN Association like to fund someone to do some of this
>>>>>> work? This is just one of several ideas mentioned on
>>>>>> https://github.com/ckan/ideas-and-roadmap/issues/152 that really
>>>>>> needs to be done if CKAN is going to thrive instead of just survive.
>>>>>>
>>>>>> Any feedback welcome…
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> Ross.
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ckan-dev mailing list
>>>>>> ckan-dev at lists.okfn.org
>>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>>>
>>>>>
>>>>> Hi Ross,
>>>>>
>>>>> I believe that a Flask port (or rewrite) is an excellent idea for CKAN
>>>>> 3.0 in order to support Python 3.x
>>>>> The alternative would be to port Pylons to Python 3.x, which perhaps
>>>>> is a more difficult task...
>>>>>
>>>>> Given that Python 2.x will EOL relatively soon, CKAN should move
>>>>> forward.
>>>>>
>>>>> Just my 2 cents.
>>>>>
>>>>> Best,
>>>>> Angelos
>>>>>
>>>>> --
>>>>> Angelos Tzotsos, PhD
>>>>> OSGeo Charter Member
>>>>> http://users.ntua.gr/tzotsos
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ckan-dev mailing list
>>>>> ckan-dev at lists.okfn.org
>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ckan-dev mailing listckan-dev at lists.okfn.orghttps://lists.okfn.org/mailman/listinfo/ckan-dev
>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ckan-dev mailing list
>>>> ckan-dev at lists.okfn.org
>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>
>>>>
>>>
>>
>> --
>> *STEVEN DE COSTA *|
>> *EXECUTIVE DIRECTOR*www.linkdigital.com.au
>>
>>
>>
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> <javascript:_e(%7B%7D,'cvml','ckan-dev at lists.okfn.org');>
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>
>>
>
--
*STEVEN DE COSTA *|
*EXECUTIVE DIRECTOR*www.linkdigital.com.au
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150915/51e1be7b/attachment-0003.html>
More information about the ckan-dev
mailing list