[ckan-dev] Future, flask, breaking things, funding.

Karissa McKelvey karissa.mckelvey at gmail.com
Tue Sep 15 03:39:03 UTC 2015


Hey Steven! Yeah, let's get together. I'll be around those days.

On Mon, Sep 14, 2015 at 7:20 PM, Steven De Costa <
steven.decosta at linkdigital.com.au> wrote:

> I'll be in San Francisco 4-6 October if you wanted to catch up and look at
> it together Karissa?
>
> I also have some thoughts about remaining flexible in the storage types
> that CKAN might support. Basically, it would be nice if these were
> abstracted into an API and created via the admin as provisioning requests.
> This would allow a platform to provision a variety of storage options and
> enable them at a resource level similar to the resource views at the UI
> level. It would also allow for network level security models to be
> employed, or data storage sovereignty to be maintained in accordance to to
> jurisdictional or security classification. Maybe we could call these
> resource containers?
>
> Happy to catch up with anyone in SF re CKAN :) In fact, happy to run a
> meetup there if there is interest... physical + video conference.
>
> I'm in Vegas for re:Invent on the 7th to 9th too :) It would be good to
> form a huddle of CKANers at the re:Play party on the 8th!
>
> Cheers,
> Steven
>
> *STEVEN DE COSTA *|
> *EXECUTIVE DIRECTOR*www.linkdigital.com.au
>
>
>
> On 15 September 2015 at 08:41, Karissa McKelvey <
> karissa.mckelvey at gmail.com> wrote:
>
>> I think Dat would be a great way to allow programmatic access to datasets
>> in CKAN. Dat handles streaming data very well. I imagine being able to
>> replace the `.csv` with a `.dat` and get streaming and incremental uploads
>> and downloads.
>>
>> Dat has a two-phase sync process, the first computes the differences
>> between the local and remote copy, and the second syncs the data that is
>> different. This leads to users never having to download the same data
>> twice, and reduces bandwidth costs for the host. Because dat knows the
>> differences between each data version, it is also a really lightweight way
>> to see and overview of previous data versions for a single dataset.
>>
>> I'd be happy to chat more about how this might work in practice!
>>
>> Cheers,
>>
>> On Mon, Sep 14, 2015 at 3:37 PM, Karissa McKelvey <
>> karissa.mckelvey at gmail.com> wrote:
>>
>>> I think Dat would be a great way to allow programmatic access to
>>> datasets in CKAN. Dat handles streaming data very well. I imagine being
>>> able to replace the `.csv` with a `.dat` and get streaming and incremental
>>> uploads and downloads.
>>>
>>> Dat has a two-phase sync process, the first computes the differences
>>> between the local and remote copy, and the second syncs the data that is
>>> different. This leads to users never having to download the same data
>>> twice, and reduces bandwidth costs for the host. Because dat knows the
>>> differences between each data version, it is also a really lightweight way
>>> to see and overview of previous data versions for a single dataset.
>>>
>>> I'd be happy to chat more about how this might work in practice!
>>>
>>> Cheers,
>>>
>>>
>>> On Mon, Sep 14, 2015 at 2:51 PM, Joel Natividad <
>>> joel.natividad at ontodia.com> wrote:
>>>
>>>> Hi all,
>>>> What about integrating with Dat <http://dat-data.com>?
>>>>
>>>> It handles streaming data; can handle huge datasets; can do deltas (no
>>>> need to re-download a huge dataset over and over again) ; has versions (not
>>>> just revisions as data consumers have legitimate reasons to use different
>>>> versions of data, down to the row level), and makes CKAN more "dog-fooding"
>>>> friendly (i.e. publishers using it not only to publish data, but to
>>>> actually build solutions ).
>>>>
>>>> Marianne Bellotti (CKAN-powered HDX) and I independently spent some
>>>> quality time with Karissa McKelvey - one of the three key developers
>>>> behind Dat <http://dat-data.com/team>, when she was in NYC last month
>>>> and discussed at length how Dat + CKAN can work together.
>>>>
>>>> Karissa even put together a rough spec on a "ckanext-dat" extension.
>>>>
>>>> FYI, Dat is supported by usopendata.org
>>>> <https://usopendata.org/2015/07/29/dat-beta/>, which also happens to
>>>> be the org behind CKAN-Multisite, which was just announced as
>>>> generally available today.
>>>> <https://usopendata.org/2015/09/14/ckan-multisite/>
>>>>
>>>> Best,
>>>> Joel
>>>>
>>>>
>>>> --
>>>> Joel Natividad
>>>> +1 347-565-5635
>>>> @jqnatividad
>>>>
>>>> Ontodia, Inc.
>>>> 137 Varick Street, 2nd Floor, New York, NY 10013
>>>>
>>>> On Mon, Sep 14, 2015 at 5:11 PM, Steven De Costa <
>>>> steven.decosta at linkdigital.com.au> wrote:
>>>>
>>>>> I'm 'all in' on this discussion :) I'll setup a doodle and we can pick
>>>>> a time to do a video call...
>>>>>
>>>>> My 2c on some points.
>>>>>
>>>>> 1. Perhaps redev could be bottom up. Start with resources and widen
>>>>> its ability. Crud can then be rebuilt over the top.
>>>>> 2. Carefully consider the longest term possible and how the app may
>>>>> mature in the future.
>>>>> 3. Consider interoperability between n+1 platforms via linked open
>>>>> data, again with realtime in mind
>>>>> 4. Consider packages further. Could we add new package types that are
>>>>> built on 3.0 thinking and have them co exist with current packages? If so
>>>>> then existing extensions could be modified less dramatically to apply only
>>>>> to v2 packages.
>>>>> 5. Think about migration scenarios. Could a v2 CKAN remain as a dumb
>>>>> web app harvesting from a 3.0? If so, we could priorities workflows around
>>>>> custodians and ETL before end users.
>>>>> 6. Yes I'm sure others in the steering group would support the work.
>>>>> Just remember they are also just volunteers :)
>>>>> 7. Yes I'm sure funding could come from the Association, just so long
>>>>> as funding first goes into the association. So, we'd all have a part to
>>>>> play in signing up paying members - happy to take any leads from people on
>>>>> that point :)
>>>>>
>>>>> Hoots!
>>>>>
>>>>>
>>>>> On Tuesday, September 15, 2015, Denis Zgonjanin <
>>>>> deniszgonjanin at gmail.com> wrote:
>>>>>
>>>>>> Yes, we should think of use cases. Realtime data is just one. I'm not
>>>>>> just talking about things we might want to do. Here are the current things
>>>>>> in CKAN that would benefit from better asynchronous support:
>>>>>>
>>>>>> - Datastore & Datapusher. We could integrate datapusher into CKAN, so
>>>>>> people don't need to set up an additional web service just to use stock
>>>>>> CKAN.
>>>>>> - Harvesting. Set up a periodic callback that calls harvest sources
>>>>>> every hour. Super easy when compared to having to set up reddit/ZeroMQ, and
>>>>>> another 3(!) long-running processes running in the background.
>>>>>> - Webhooks. They must be pushed off to a celery queue because of
>>>>>> Pylons. With async they could be fired off easily.
>>>>>> - Analytics & analytics reports; Sending automated emails and other
>>>>>> automated tasks.
>>>>>> - Anything where right now we have to set up cron jobs.
>>>>>>
>>>>>> And probably most importantly - CKAN is going to need a face lift
>>>>>> eventually if it's to remain relevant. It can't be stuck in CRUD land
>>>>>> forever. There is plenty of time for this, no rush. But building cool
>>>>>> shinny new things with fancy front-end javascript would be hard right now.
>>>>>> It will be hard on any web framework built on the idea that your whole
>>>>>> application context is transferred to the user on every HTTP request, and
>>>>>> that nothing else except that is going on in the backend.
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 14, 2015 at 9:34 AM, Stéphane Guidoin <
>>>>>> stephane.guidoin at gmail.com> wrote:
>>>>>>
>>>>>>> *Now that government is (slowly) catching on, more stream, API, and
>>>>>>> even real-time data is being published. CKAN doesn't do a great job here.
>>>>>>> The biggest obstacle to creating nice extensions to CKAN for non-file data
>>>>>>> is that Pylons is still firmly stuck within the HTTP request-response
>>>>>>> lifecycle. *
>>>>>>>
>>>>>>> I wonder what should be the role of CKAN when it comes to APIs,
>>>>>>> streams and other things. Those stuff tend to be fairly resource intensive
>>>>>>> and most of the time, they are developed and hosted on their own, not on
>>>>>>> the open data portal. So what should be the role of CKAN on this? How much
>>>>>>> do we want to be able to integrate CKAN with APIs and streams, what should
>>>>>>> it give?
>>>>>>>
>>>>>>> From my point of view, moving to Flask or other, framework is mostly
>>>>>>> a question of technical debt (
>>>>>>> https://18f.gsa.gov/2015/08/07/technical-debt-1/) and making sure
>>>>>>> CKAN remains flexible (and build-in async would indeed help)
>>>>>>>
>>>>>>> When it comes to see how to support realtime data, even if it's to
>>>>>>> mainly enable extension development, some thinking about use case is needed
>>>>>>> in order to avoid jumping into something that would be very time intensive
>>>>>>> in terms of dev.
>>>>>>>
>>>>>>> Stéphane
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2015-09-14 08:57, Denis Zgonjanin wrote:
>>>>>>>
>>>>>>> Right now CKAN is great for static sources of data, which is really
>>>>>>> all that existed from government sources when CKAN was first written.
>>>>>>>
>>>>>>> Now that government is (slowly) catching on, more stream, API, and
>>>>>>> even real-time data is being published. CKAN doesn't do a great job here.
>>>>>>> The biggest obstacle to creating nice extensions to CKAN for non-file data
>>>>>>> is that Pylons is still firmly stuck within the HTTP request-response
>>>>>>> lifecycle.
>>>>>>>
>>>>>>> This worked well for CRUD apps, but now is really showing it's
>>>>>>> limitations. It's hard to do anything in CKAN that doesn't take place
>>>>>>> within the context of a user's HTTP request. If you want to do some extra
>>>>>>> data processing on the side, you have to use celery queues or worse, cron.
>>>>>>> Worse yet, some people do try to put extra processing inside the
>>>>>>> request-response lifecycle, causing problems.
>>>>>>>
>>>>>>> Even core CKAN is guilty of this. For example, CKAN will call
>>>>>>> datapusher to send upload jobs and retrieve job results, and those requests
>>>>>>> to datapusher happen while the user is waiting for the request to return.
>>>>>>> This is kind of terrible. Not even because somebody did it this way, but
>>>>>>> because CKAN doesn't give you a sane alternative to do it properly.
>>>>>>>
>>>>>>> Porting CKAN to flask is no small feat, so let's make sure we do it
>>>>>>> right. Now that we're not using CKAN to just host static files anymore, we
>>>>>>> need to have better, built-in async support in CKAN. Perhaps this means
>>>>>>> moving to Python 3 where we'll have asyncio (and hopefully a future version
>>>>>>> of flask will work well with it). Other frameworks, like tornado, are also
>>>>>>> quite lightweight and support this out of the box for python 2.x.
>>>>>>>
>>>>>>> - Denis
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Sep 14, 2015 at 3:56 AM, Angelos Tzotsos <
>>>>>>> gcpp.kalxas at gmail.com> wrote:
>>>>>>>
>>>>>>>> On 09/14/2015 10:24 AM, Ross Jones wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I’ve recently been playing about with implementing parts of CKAN
>>>>>>>>> in Flask side-by-side with the current Pylons implementation. I’m doing it
>>>>>>>>> like this so that it isn’t immediately obvious that there’s a migration
>>>>>>>>> happening towards using Flask (aka nothing breaks).  I don’t think this
>>>>>>>>> branch should ever be merged, it’s more exploratory but it has raised some
>>>>>>>>> questions that I think it would be good to discuss.
>>>>>>>>>
>>>>>>>>> WARNING:anecdata
>>>>>>>>> It’s pretty clear that the vast majority of people asked would
>>>>>>>>> like to move to Flask as a replacement for some layers of the system
>>>>>>>>> (leaving things like logic and plugins alone).
>>>>>>>>> ENDWARNING
>>>>>>>>>
>>>>>>>>> We’ve discussed at the tech-team meetings, but I think a longer,
>>>>>>>>> more accessible conversation would be beneficial.
>>>>>>>>>
>>>>>>>>> 1. What version of CKAN should be targeted? Common sense suggests
>>>>>>>>> 3.0, but that being the case, exactly how far can we go in breaking some
>>>>>>>>> backward compatibility?  This isn’t really a technical question - would be
>>>>>>>>> good to hear what the community would accept …
>>>>>>>>>
>>>>>>>>> 2. Does it *really* need to be side-by-side?  Running Flask and
>>>>>>>>> Pylons side-by-side means staying on Python 2 for another few years
>>>>>>>>> (because Pylons).  A reasonably deep incision and removal of
>>>>>>>>> non-logic/non-plugin code would make a move to Py3 easier, but with some
>>>>>>>>> level of breakage in external plugins. Staying on 2 would mean a move to 3
>>>>>>>>> at a later date and more pain.
>>>>>>>>>
>>>>>>>>> 3. Would the CKAN Association like to fund someone to do some of
>>>>>>>>> this work? This is just one of several ideas mentioned on
>>>>>>>>> https://github.com/ckan/ideas-and-roadmap/issues/152 that really
>>>>>>>>> needs to be done if CKAN is going to thrive instead of just survive.
>>>>>>>>>
>>>>>>>>> Any feedback welcome…
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>>
>>>>>>>>> Ross.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> ckan-dev mailing list
>>>>>>>>> ckan-dev at lists.okfn.org
>>>>>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>>>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Ross,
>>>>>>>>
>>>>>>>> I believe that a Flask port (or rewrite) is an excellent idea for
>>>>>>>> CKAN 3.0 in order to support Python 3.x
>>>>>>>> The alternative would be to port Pylons to Python 3.x, which
>>>>>>>> perhaps is a more difficult task...
>>>>>>>>
>>>>>>>> Given that Python 2.x will EOL relatively soon, CKAN should move
>>>>>>>> forward.
>>>>>>>>
>>>>>>>> Just my 2 cents.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Angelos
>>>>>>>>
>>>>>>>> --
>>>>>>>> Angelos Tzotsos, PhD
>>>>>>>> OSGeo Charter Member
>>>>>>>> http://users.ntua.gr/tzotsos
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ckan-dev mailing list
>>>>>>>> ckan-dev at lists.okfn.org
>>>>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ckan-dev mailing listckan-dev at lists.okfn.orghttps://lists.okfn.org/mailman/listinfo/ckan-dev
>>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ckan-dev mailing list
>>>>>>> ckan-dev at lists.okfn.org
>>>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> *STEVEN DE COSTA *|
>>>>> *EXECUTIVE DIRECTOR*www.linkdigital.com.au
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ckan-dev mailing list
>>>>> ckan-dev at lists.okfn.org
>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Karissa McKelvey
>>> http://karissa.github.io/ <http://karissamck.com>
>>>
>>>
>>
>>
>> --
>> Karissa McKelvey
>> http://karissa.github.io/
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>
>>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>


-- 
Karissa McKelvey
http://karissa.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150914/a7365ebc/attachment-0003.html>


More information about the ckan-dev mailing list