[ckan-dev] Future, flask, breaking things, funding.

Joel Natividad joel.natividad at ontodia.com
Mon Sep 14 21:51:33 UTC 2015


Hi all,
What about integrating with Dat <http://dat-data.com>?

It handles streaming data; can handle huge datasets; can do deltas (no need
to re-download a huge dataset over and over again) ; has versions (not just
revisions as data consumers have legitimate reasons to use different
versions of data, down to the row level), and makes CKAN more "dog-fooding"
friendly (i.e. publishers using it not only to publish data, but to
actually build solutions ).

Marianne Bellotti (CKAN-powered HDX) and I independently spent some quality
time with Karissa McKelvey - one of the three key developers behind Dat
<http://dat-data.com/team>, when she was in NYC last month and discussed at
length how Dat + CKAN can work together.

Karissa even put together a rough spec on a "ckanext-dat" extension.

FYI, Dat is supported by usopendata.org
<https://usopendata.org/2015/07/29/dat-beta/>, which also happens to be the
org behind CKAN-Multisite, which was just announced as generally available
today. <https://usopendata.org/2015/09/14/ckan-multisite/>

Best,
Joel


--
Joel Natividad
+1 347-565-5635
@jqnatividad

Ontodia, Inc.
137 Varick Street, 2nd Floor, New York, NY 10013

On Mon, Sep 14, 2015 at 5:11 PM, Steven De Costa <
steven.decosta at linkdigital.com.au> wrote:

> I'm 'all in' on this discussion :) I'll setup a doodle and we can pick a
> time to do a video call...
>
> My 2c on some points.
>
> 1. Perhaps redev could be bottom up. Start with resources and widen its
> ability. Crud can then be rebuilt over the top.
> 2. Carefully consider the longest term possible and how the app may mature
> in the future.
> 3. Consider interoperability between n+1 platforms via linked open data,
> again with realtime in mind
> 4. Consider packages further. Could we add new package types that are
> built on 3.0 thinking and have them co exist with current packages? If so
> then existing extensions could be modified less dramatically to apply only
> to v2 packages.
> 5. Think about migration scenarios. Could a v2 CKAN remain as a dumb web
> app harvesting from a 3.0? If so, we could priorities workflows around
> custodians and ETL before end users.
> 6. Yes I'm sure others in the steering group would support the work. Just
> remember they are also just volunteers :)
> 7. Yes I'm sure funding could come from the Association, just so long as
> funding first goes into the association. So, we'd all have a part to play
> in signing up paying members - happy to take any leads from people on that
> point :)
>
> Hoots!
>
>
> On Tuesday, September 15, 2015, Denis Zgonjanin <deniszgonjanin at gmail.com>
> wrote:
>
>> Yes, we should think of use cases. Realtime data is just one. I'm not
>> just talking about things we might want to do. Here are the current things
>> in CKAN that would benefit from better asynchronous support:
>>
>> - Datastore & Datapusher. We could integrate datapusher into CKAN, so
>> people don't need to set up an additional web service just to use stock
>> CKAN.
>> - Harvesting. Set up a periodic callback that calls harvest sources every
>> hour. Super easy when compared to having to set up reddit/ZeroMQ, and
>> another 3(!) long-running processes running in the background.
>> - Webhooks. They must be pushed off to a celery queue because of Pylons.
>> With async they could be fired off easily.
>> - Analytics & analytics reports; Sending automated emails and other
>> automated tasks.
>> - Anything where right now we have to set up cron jobs.
>>
>> And probably most importantly - CKAN is going to need a face lift
>> eventually if it's to remain relevant. It can't be stuck in CRUD land
>> forever. There is plenty of time for this, no rush. But building cool
>> shinny new things with fancy front-end javascript would be hard right now.
>> It will be hard on any web framework built on the idea that your whole
>> application context is transferred to the user on every HTTP request, and
>> that nothing else except that is going on in the backend.
>>
>>
>> On Mon, Sep 14, 2015 at 9:34 AM, Stéphane Guidoin <
>> stephane.guidoin at gmail.com> wrote:
>>
>>> *Now that government is (slowly) catching on, more stream, API, and even
>>> real-time data is being published. CKAN doesn't do a great job here. The
>>> biggest obstacle to creating nice extensions to CKAN for non-file data is
>>> that Pylons is still firmly stuck within the HTTP request-response
>>> lifecycle. *
>>>
>>> I wonder what should be the role of CKAN when it comes to APIs, streams
>>> and other things. Those stuff tend to be fairly resource intensive and most
>>> of the time, they are developed and hosted on their own, not on the open
>>> data portal. So what should be the role of CKAN on this? How much do we
>>> want to be able to integrate CKAN with APIs and streams, what should it
>>> give?
>>>
>>> From my point of view, moving to Flask or other, framework is mostly a
>>> question of technical debt (
>>> https://18f.gsa.gov/2015/08/07/technical-debt-1/) and making sure CKAN
>>> remains flexible (and build-in async would indeed help)
>>>
>>> When it comes to see how to support realtime data, even if it's to
>>> mainly enable extension development, some thinking about use case is needed
>>> in order to avoid jumping into something that would be very time intensive
>>> in terms of dev.
>>>
>>> Stéphane
>>>
>>>
>>>
>>> On 2015-09-14 08:57, Denis Zgonjanin wrote:
>>>
>>> Right now CKAN is great for static sources of data, which is really all
>>> that existed from government sources when CKAN was first written.
>>>
>>> Now that government is (slowly) catching on, more stream, API, and even
>>> real-time data is being published. CKAN doesn't do a great job here. The
>>> biggest obstacle to creating nice extensions to CKAN for non-file data is
>>> that Pylons is still firmly stuck within the HTTP request-response
>>> lifecycle.
>>>
>>> This worked well for CRUD apps, but now is really showing it's
>>> limitations. It's hard to do anything in CKAN that doesn't take place
>>> within the context of a user's HTTP request. If you want to do some extra
>>> data processing on the side, you have to use celery queues or worse, cron.
>>> Worse yet, some people do try to put extra processing inside the
>>> request-response lifecycle, causing problems.
>>>
>>> Even core CKAN is guilty of this. For example, CKAN will call datapusher
>>> to send upload jobs and retrieve job results, and those requests to
>>> datapusher happen while the user is waiting for the request to return. This
>>> is kind of terrible. Not even because somebody did it this way, but because
>>> CKAN doesn't give you a sane alternative to do it properly.
>>>
>>> Porting CKAN to flask is no small feat, so let's make sure we do it
>>> right. Now that we're not using CKAN to just host static files anymore, we
>>> need to have better, built-in async support in CKAN. Perhaps this means
>>> moving to Python 3 where we'll have asyncio (and hopefully a future version
>>> of flask will work well with it). Other frameworks, like tornado, are also
>>> quite lightweight and support this out of the box for python 2.x.
>>>
>>> - Denis
>>>
>>>
>>> On Mon, Sep 14, 2015 at 3:56 AM, Angelos Tzotsos <gcpp.kalxas at gmail.com>
>>> wrote:
>>>
>>>> On 09/14/2015 10:24 AM, Ross Jones wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I’ve recently been playing about with implementing parts of CKAN in
>>>>> Flask side-by-side with the current Pylons implementation. I’m doing it
>>>>> like this so that it isn’t immediately obvious that there’s a migration
>>>>> happening towards using Flask (aka nothing breaks).  I don’t think this
>>>>> branch should ever be merged, it’s more exploratory but it has raised some
>>>>> questions that I think it would be good to discuss.
>>>>>
>>>>> WARNING:anecdata
>>>>> It’s pretty clear that the vast majority of people asked would like to
>>>>> move to Flask as a replacement for some layers of the system (leaving
>>>>> things like logic and plugins alone).
>>>>> ENDWARNING
>>>>>
>>>>> We’ve discussed at the tech-team meetings, but I think a longer, more
>>>>> accessible conversation would be beneficial.
>>>>>
>>>>> 1. What version of CKAN should be targeted? Common sense suggests 3.0,
>>>>> but that being the case, exactly how far can we go in breaking some
>>>>> backward compatibility?  This isn’t really a technical question - would be
>>>>> good to hear what the community would accept …
>>>>>
>>>>> 2. Does it *really* need to be side-by-side?  Running Flask and Pylons
>>>>> side-by-side means staying on Python 2 for another few years (because
>>>>> Pylons).  A reasonably deep incision and removal of non-logic/non-plugin
>>>>> code would make a move to Py3 easier, but with some level of breakage in
>>>>> external plugins. Staying on 2 would mean a move to 3 at a later date and
>>>>> more pain.
>>>>>
>>>>> 3. Would the CKAN Association like to fund someone to do some of this
>>>>> work? This is just one of several ideas mentioned on
>>>>> https://github.com/ckan/ideas-and-roadmap/issues/152 that really
>>>>> needs to be done if CKAN is going to thrive instead of just survive.
>>>>>
>>>>> Any feedback welcome…
>>>>>
>>>>> Cheers
>>>>>
>>>>> Ross.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ckan-dev mailing list
>>>>> ckan-dev at lists.okfn.org
>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>>
>>>>
>>>> Hi Ross,
>>>>
>>>> I believe that a Flask port (or rewrite) is an excellent idea for CKAN
>>>> 3.0 in order to support Python 3.x
>>>> The alternative would be to port Pylons to Python 3.x, which perhaps is
>>>> a more difficult task...
>>>>
>>>> Given that Python 2.x will EOL relatively soon, CKAN should move
>>>> forward.
>>>>
>>>> Just my 2 cents.
>>>>
>>>> Best,
>>>> Angelos
>>>>
>>>> --
>>>> Angelos Tzotsos, PhD
>>>> OSGeo Charter Member
>>>> http://users.ntua.gr/tzotsos
>>>>
>>>>
>>>> _______________________________________________
>>>> ckan-dev mailing list
>>>> ckan-dev at lists.okfn.org
>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ckan-dev mailing listckan-dev at lists.okfn.orghttps://lists.okfn.org/mailman/listinfo/ckan-dev
>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>
>>>
>>
>
> --
> *STEVEN DE COSTA *|
> *EXECUTIVE DIRECTOR*www.linkdigital.com.au
>
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150914/ee615356/attachment-0003.html>


More information about the ckan-dev mailing list