[ckan-dev] Dat Integration

Marianne Bellotti marianne at exversion.com
Tue Sep 15 03:52:16 UTC 2015


So treating them as binary files would essentially only give you the same
level of version control that CKAN out of the box does already?


Date: Mon, 14 Sep 2015 20:37:50 -0700
> From: Karissa McKelvey <karissa.mckelvey at gmail.com>
> To: CKAN Development Discussions <ckan-dev at lists.okfn.org>
> Subject: Re: [ckan-dev] Dat Integration
> Message-ID:
>         <
> CAMmgt+yEWKGbDU04JrP5pus0Lhy+WRdmWsoiEubGL+ra-a27Gw at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Dat has the ability to write binary files to disk, so you don't need
> to supply a key. In that mode, dat doesn't parse the data using a
> tabular parser e.g, csv, json, xlsx.
>
> In that scenario, though, it can't do diffs by row. If the user
> supplies the key, probably best to do this through the UI, then it
> could version the differences by rows in the table.
>
> On Mon, Sep 14, 2015 at 7:50 PM, Marianne Bellotti
> <marianne at exversion.com> wrote:
> > Well since Joel is going to pull me into this discussion I might as well
> > give my thoughts :)
> >
> > The one thing I keep coming back to with Dat integration for CKAN is
> keys.
> > How will CKAN know which column in the dataset to use as the primary key
> for
> > Dat's version control? Without assigning a key any new versions will
> just be
> > added as new data in Dat so it's sort of an important thing.
> >
> > Does CKAN guess it? Does the user assign it through the resource form? If
> > the user assigns it how do you communicate to non-technical people
> exactly
> > what information they are supposed to supply? What happens when someone
> does
> > something wrong? Is version control turned off or just allowed to run
> > incorrectly? What happens if the data doesn't actually have a unique key
> > that the user can assign?
> >
> > I still have not come up with good answers to these questions.
> >
> > -Marianne
> >
> > On Mon, Sep 14, 2015 at 10:20 PM, <ckan-dev-request at lists.okfn.org>
> wrote:
> >>
> >> Send ckan-dev mailing list submissions to
> >>         ckan-dev at lists.okfn.org
> >>
> >> To subscribe or unsubscribe via the World Wide Web, visit
> >>         https://lists.okfn.org/mailman/listinfo/ckan-dev
> >> or, via email, send a message with subject or body 'help' to
> >>         ckan-dev-request at lists.okfn.org
> >>
> >> You can reach the person managing the list at
> >>         ckan-dev-owner at lists.okfn.org
> >>
> >> When replying, please edit your Subject line so it is more specific
> >> than "Re: Contents of ckan-dev digest..."
> >>
> >>
> >> Today's Topics:
> >>
> >>    1. Re: Future, flask, breaking things, funding. (Steven De Costa)
> >>
> >>
> >> ----------------------------------------------------------------------
> >>
> >> Message: 1
> >> Date: Tue, 15 Sep 2015 12:20:26 +1000
> >> From: Steven De Costa <steven.decosta at linkdigital.com.au>
> >> To: CKAN Development Discussions <ckan-dev at lists.okfn.org>
> >> Subject: Re: [ckan-dev] Future, flask, breaking things, funding.
> >> Message-ID:
> >>
> >> <CAMp=Osb76LGU294W4wcUnJpG0ZO2us6yUdPXJ3Sg85Q+sFZ9PQ at mail.gmail.com>
> >> Content-Type: text/plain; charset="utf-8"
> >>
> >> I'll be in San Francisco 4-6 October if you wanted to catch up and look
> at
> >> it together Karissa?
> >>
> >> I also have some thoughts about remaining flexible in the storage types
> >> that CKAN might support. Basically, it would be nice if these were
> >> abstracted into an API and created via the admin as provisioning
> requests.
> >> This would allow a platform to provision a variety of storage options
> and
> >> enable them at a resource level similar to the resource views at the UI
> >> level. It would also allow for network level security models to be
> >> employed, or data storage sovereignty to be maintained in accordance to
> to
> >> jurisdictional or security classification. Maybe we could call these
> >> resource containers?
> >>
> >> Happy to catch up with anyone in SF re CKAN :) In fact, happy to run a
> >> meetup there if there is interest... physical + video conference.
> >>
> >> I'm in Vegas for re:Invent on the 7th to 9th too :) It would be good to
> >> form a huddle of CKANers at the re:Play party on the 8th!
> >>
> >> Cheers,
> >> Steven
> >>
> >> *STEVEN DE COSTA *|
> >> *EXECUTIVE DIRECTOR*www.linkdigital.com.au
> >>
> >>
> >>
> >> On 15 September 2015 at 08:41, Karissa McKelvey
> >> <karissa.mckelvey at gmail.com>
> >> wrote:
> >>
> >> > I think Dat would be a great way to allow programmatic access to
> >> > datasets
> >> > in CKAN. Dat handles streaming data very well. I imagine being able to
> >> > replace the `.csv` with a `.dat` and get streaming and incremental
> >> > uploads
> >> > and downloads.
> >> >
> >> > Dat has a two-phase sync process, the first computes the differences
> >> > between the local and remote copy, and the second syncs the data that
> is
> >> > different. This leads to users never having to download the same data
> >> > twice, and reduces bandwidth costs for the host. Because dat knows the
> >> > differences between each data version, it is also a really lightweight
> >> > way
> >> > to see and overview of previous data versions for a single dataset.
> >> >
> >> > I'd be happy to chat more about how this might work in practice!
> >> >
> >> > Cheers,
> >> >
> >> > On Mon, Sep 14, 2015 at 3:37 PM, Karissa McKelvey <
> >> > karissa.mckelvey at gmail.com> wrote:
> >> >
> >> >> I think Dat would be a great way to allow programmatic access to
> >> >> datasets
> >> >> in CKAN. Dat handles streaming data very well. I imagine being able
> to
> >> >> replace the `.csv` with a `.dat` and get streaming and incremental
> >> >> uploads
> >> >> and downloads.
> >> >>
> >> >> Dat has a two-phase sync process, the first computes the differences
> >> >> between the local and remote copy, and the second syncs the data that
> >> >> is
> >> >> different. This leads to users never having to download the same data
> >> >> twice, and reduces bandwidth costs for the host. Because dat knows
> the
> >> >> differences between each data version, it is also a really
> lightweight
> >> >> way
> >> >> to see and overview of previous data versions for a single dataset.
> >> >>
> >> >> I'd be happy to chat more about how this might work in practice!
> >> >>
> >> >> Cheers,
> >> >>
> >> >>
> >> >> On Mon, Sep 14, 2015 at 2:51 PM, Joel Natividad <
> >> >> joel.natividad at ontodia.com> wrote:
> >> >>
> >> >>> Hi all,
> >> >>> What about integrating with Dat <http://dat-data.com>?
> >> >>>
> >> >>> It handles streaming data; can handle huge datasets; can do deltas
> (no
> >> >>> need to re-download a huge dataset over and over again) ; has
> versions
> >> >>> (not
> >> >>> just revisions as data consumers have legitimate reasons to use
> >> >>> different
> >> >>> versions of data, down to the row level), and makes CKAN more
> >> >>> "dog-fooding"
> >> >>> friendly (i.e. publishers using it not only to publish data, but to
> >> >>> actually build solutions ).
> >> >>>
> >> >>> Marianne Bellotti (CKAN-powered HDX) and I independently spent some
> >> >>> quality time with Karissa McKelvey - one of the three key developers
> >> >>> behind Dat <http://dat-data.com/team>, when she was in NYC last
> month
> >> >>> and discussed at length how Dat + CKAN can work together.
> >> >>>
> >> >>> Karissa even put together a rough spec on a "ckanext-dat" extension.
> >> >>>
> >> >>> FYI, Dat is supported by usopendata.org
> >> >>> <https://usopendata.org/2015/07/29/dat-beta/>, which also happens
> to
> >> >>> be
> >> >>> the org behind CKAN-Multisite, which was just announced as generally
> >> >>> available today. <https://usopendata.org/2015/09/14/ckan-multisite/
> >
> >> >>>
> >> >>> Best,
> >> >>> Joel
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Joel Natividad
> >> >>> +1 347-565-5635
> >> >>> @jqnatividad
> >> >>>
> >> >>> Ontodia, Inc.
> >> >>> 137 Varick Street, 2nd Floor, New York, NY 10013
> >> >>>
> >> >>> On Mon, Sep 14, 2015 at 5:11 PM, Steven De Costa <
> >> >>> steven.decosta at linkdigital.com.au> wrote:
> >> >>>
> >> >>>> I'm 'all in' on this discussion :) I'll setup a doodle and we can
> >> >>>> pick
> >> >>>> a time to do a video call...
> >> >>>>
> >> >>>> My 2c on some points.
> >> >>>>
> >> >>>> 1. Perhaps redev could be bottom up. Start with resources and widen
> >> >>>> its
> >> >>>> ability. Crud can then be rebuilt over the top.
> >> >>>> 2. Carefully consider the longest term possible and how the app may
> >> >>>> mature in the future.
> >> >>>> 3. Consider interoperability between n+1 platforms via linked open
> >> >>>> data, again with realtime in mind
> >> >>>> 4. Consider packages further. Could we add new package types that
> are
> >> >>>> built on 3.0 thinking and have them co exist with current packages?
> >> >>>> If so
> >> >>>> then existing extensions could be modified less dramatically to
> apply
> >> >>>> only
> >> >>>> to v2 packages.
> >> >>>> 5. Think about migration scenarios. Could a v2 CKAN remain as a
> dumb
> >> >>>> web app harvesting from a 3.0? If so, we could priorities workflows
> >> >>>> around
> >> >>>> custodians and ETL before end users.
> >> >>>> 6. Yes I'm sure others in the steering group would support the
> work.
> >> >>>> Just remember they are also just volunteers :)
> >> >>>> 7. Yes I'm sure funding could come from the Association, just so
> long
> >> >>>> as funding first goes into the association. So, we'd all have a
> part
> >> >>>> to
> >> >>>> play in signing up paying members - happy to take any leads from
> >> >>>> people on
> >> >>>> that point :)
> >> >>>>
> >> >>>> Hoots!
> >> >>>>
> >> >>>>
> >> >>>> On Tuesday, September 15, 2015, Denis Zgonjanin <
> >> >>>> deniszgonjanin at gmail.com> wrote:
> >> >>>>
> >> >>>>> Yes, we should think of use cases. Realtime data is just one. I'm
> >> >>>>> not
> >> >>>>> just talking about things we might want to do. Here are the
> current
> >> >>>>> things
> >> >>>>> in CKAN that would benefit from better asynchronous support:
> >> >>>>>
> >> >>>>> - Datastore & Datapusher. We could integrate datapusher into CKAN,
> >> >>>>> so
> >> >>>>> people don't need to set up an additional web service just to use
> >> >>>>> stock
> >> >>>>> CKAN.
> >> >>>>> - Harvesting. Set up a periodic callback that calls harvest
> sources
> >> >>>>> every hour. Super easy when compared to having to set up
> >> >>>>> reddit/ZeroMQ, and
> >> >>>>> another 3(!) long-running processes running in the background.
> >> >>>>> - Webhooks. They must be pushed off to a celery queue because of
> >> >>>>> Pylons. With async they could be fired off easily.
> >> >>>>> - Analytics & analytics reports; Sending automated emails and
> other
> >> >>>>> automated tasks.
> >> >>>>> - Anything where right now we have to set up cron jobs.
> >> >>>>>
> >> >>>>> And probably most importantly - CKAN is going to need a face lift
> >> >>>>> eventually if it's to remain relevant. It can't be stuck in CRUD
> >> >>>>> land
> >> >>>>> forever. There is plenty of time for this, no rush. But building
> >> >>>>> cool
> >> >>>>> shinny new things with fancy front-end javascript would be hard
> >> >>>>> right now.
> >> >>>>> It will be hard on any web framework built on the idea that your
> >> >>>>> whole
> >> >>>>> application context is transferred to the user on every HTTP
> >> >>>>> request, and
> >> >>>>> that nothing else except that is going on in the backend.
> >> >>>>>
> >> >>>>>
> >> >>>>> On Mon, Sep 14, 2015 at 9:34 AM, St?phane Guidoin <
> >> >>>>> stephane.guidoin at gmail.com> wrote:
> >> >>>>>
> >> >>>>>> *Now that government is (slowly) catching on, more stream, API,
> and
> >> >>>>>> even real-time data is being published. CKAN doesn't do a great
> job
> >> >>>>>> here.
> >> >>>>>> The biggest obstacle to creating nice extensions to CKAN for
> >> >>>>>> non-file data
> >> >>>>>> is that Pylons is still firmly stuck within the HTTP
> >> >>>>>> request-response
> >> >>>>>> lifecycle. *
> >> >>>>>>
> >> >>>>>> I wonder what should be the role of CKAN when it comes to APIs,
> >> >>>>>> streams and other things. Those stuff tend to be fairly resource
> >> >>>>>> intensive
> >> >>>>>> and most of the time, they are developed and hosted on their own,
> >> >>>>>> not on
> >> >>>>>> the open data portal. So what should be the role of CKAN on this?
> >> >>>>>> How much
> >> >>>>>> do we want to be able to integrate CKAN with APIs and streams,
> what
> >> >>>>>> should
> >> >>>>>> it give?
> >> >>>>>>
> >> >>>>>> From my point of view, moving to Flask or other, framework is
> >> >>>>>> mostly
> >> >>>>>> a question of technical debt (
> >> >>>>>> https://18f.gsa.gov/2015/08/07/technical-debt-1/) and making
> sure
> >> >>>>>> CKAN remains flexible (and build-in async would indeed help)
> >> >>>>>>
> >> >>>>>> When it comes to see how to support realtime data, even if it's
> to
> >> >>>>>> mainly enable extension development, some thinking about use case
> >> >>>>>> is needed
> >> >>>>>> in order to avoid jumping into something that would be very time
> >> >>>>>> intensive
> >> >>>>>> in terms of dev.
> >> >>>>>>
> >> >>>>>> St?phane
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On 2015-09-14 08:57, Denis Zgonjanin wrote:
> >> >>>>>>
> >> >>>>>> Right now CKAN is great for static sources of data, which is
> really
> >> >>>>>> all that existed from government sources when CKAN was first
> >> >>>>>> written.
> >> >>>>>>
> >> >>>>>> Now that government is (slowly) catching on, more stream, API,
> and
> >> >>>>>> even real-time data is being published. CKAN doesn't do a great
> job
> >> >>>>>> here.
> >> >>>>>> The biggest obstacle to creating nice extensions to CKAN for
> >> >>>>>> non-file data
> >> >>>>>> is that Pylons is still firmly stuck within the HTTP
> >> >>>>>> request-response
> >> >>>>>> lifecycle.
> >> >>>>>>
> >> >>>>>> This worked well for CRUD apps, but now is really showing it's
> >> >>>>>> limitations. It's hard to do anything in CKAN that doesn't take
> >> >>>>>> place
> >> >>>>>> within the context of a user's HTTP request. If you want to do
> some
> >> >>>>>> extra
> >> >>>>>> data processing on the side, you have to use celery queues or
> >> >>>>>> worse, cron.
> >> >>>>>> Worse yet, some people do try to put extra processing inside the
> >> >>>>>> request-response lifecycle, causing problems.
> >> >>>>>>
> >> >>>>>> Even core CKAN is guilty of this. For example, CKAN will call
> >> >>>>>> datapusher to send upload jobs and retrieve job results, and
> those
> >> >>>>>> requests
> >> >>>>>> to datapusher happen while the user is waiting for the request to
> >> >>>>>> return.
> >> >>>>>> This is kind of terrible. Not even because somebody did it this
> >> >>>>>> way, but
> >> >>>>>> because CKAN doesn't give you a sane alternative to do it
> properly.
> >> >>>>>>
> >> >>>>>> Porting CKAN to flask is no small feat, so let's make sure we do
> it
> >> >>>>>> right. Now that we're not using CKAN to just host static files
> >> >>>>>> anymore, we
> >> >>>>>> need to have better, built-in async support in CKAN. Perhaps this
> >> >>>>>> means
> >> >>>>>> moving to Python 3 where we'll have asyncio (and hopefully a
> future
> >> >>>>>> version
> >> >>>>>> of flask will work well with it). Other frameworks, like tornado,
> >> >>>>>> are also
> >> >>>>>> quite lightweight and support this out of the box for python 2.x.
> >> >>>>>>
> >> >>>>>> - Denis
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Mon, Sep 14, 2015 at 3:56 AM, Angelos Tzotsos <
> >> >>>>>> gcpp.kalxas at gmail.com> wrote:
> >> >>>>>>
> >> >>>>>>> On 09/14/2015 10:24 AM, Ross Jones wrote:
> >> >>>>>>>
> >> >>>>>>>> Hi,
> >> >>>>>>>>
> >> >>>>>>>> I?ve recently been playing about with implementing parts of
> CKAN
> >> >>>>>>>> in
> >> >>>>>>>> Flask side-by-side with the current Pylons implementation. I?m
> >> >>>>>>>> doing it
> >> >>>>>>>> like this so that it isn?t immediately obvious that there?s a
> >> >>>>>>>> migration
> >> >>>>>>>> happening towards using Flask (aka nothing breaks).  I don?t
> >> >>>>>>>> think this
> >> >>>>>>>> branch should ever be merged, it?s more exploratory but it has
> >> >>>>>>>> raised some
> >> >>>>>>>> questions that I think it would be good to discuss.
> >> >>>>>>>>
> >> >>>>>>>> WARNING:anecdata
> >> >>>>>>>> It?s pretty clear that the vast majority of people asked would
> >> >>>>>>>> like
> >> >>>>>>>> to move to Flask as a replacement for some layers of the system
> >> >>>>>>>> (leaving
> >> >>>>>>>> things like logic and plugins alone).
> >> >>>>>>>> ENDWARNING
> >> >>>>>>>>
> >> >>>>>>>> We?ve discussed at the tech-team meetings, but I think a
> longer,
> >> >>>>>>>> more accessible conversation would be beneficial.
> >> >>>>>>>>
> >> >>>>>>>> 1. What version of CKAN should be targeted? Common sense
> suggests
> >> >>>>>>>> 3.0, but that being the case, exactly how far can we go in
> >> >>>>>>>> breaking some
> >> >>>>>>>> backward compatibility?  This isn?t really a technical
> question -
> >> >>>>>>>> would be
> >> >>>>>>>> good to hear what the community would accept ?
> >> >>>>>>>>
> >> >>>>>>>> 2. Does it *really* need to be side-by-side?  Running Flask and
> >> >>>>>>>> Pylons side-by-side means staying on Python 2 for another few
> >> >>>>>>>> years
> >> >>>>>>>> (because Pylons).  A reasonably deep incision and removal of
> >> >>>>>>>> non-logic/non-plugin code would make a move to Py3 easier, but
> >> >>>>>>>> with some
> >> >>>>>>>> level of breakage in external plugins. Staying on 2 would mean
> a
> >> >>>>>>>> move to 3
> >> >>>>>>>> at a later date and more pain.
> >> >>>>>>>>
> >> >>>>>>>> 3. Would the CKAN Association like to fund someone to do some
> of
> >> >>>>>>>> this work? This is just one of several ideas mentioned on
> >> >>>>>>>> https://github.com/ckan/ideas-and-roadmap/issues/152 that
> really
> >> >>>>>>>> needs to be done if CKAN is going to thrive instead of just
> >> >>>>>>>> survive.
> >> >>>>>>>>
> >> >>>>>>>> Any feedback welcome?
> >> >>>>>>>>
> >> >>>>>>>> Cheers
> >> >>>>>>>>
> >> >>>>>>>> Ross.
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> _______________________________________________
> >> >>>>>>>> ckan-dev mailing list
> >> >>>>>>>> ckan-dev at lists.okfn.org
> >> >>>>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >> >>>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>> Hi Ross,
> >> >>>>>>>
> >> >>>>>>> I believe that a Flask port (or rewrite) is an excellent idea
> for
> >> >>>>>>> CKAN 3.0 in order to support Python 3.x
> >> >>>>>>> The alternative would be to port Pylons to Python 3.x, which
> >> >>>>>>> perhaps
> >> >>>>>>> is a more difficult task...
> >> >>>>>>>
> >> >>>>>>> Given that Python 2.x will EOL relatively soon, CKAN should move
> >> >>>>>>> forward.
> >> >>>>>>>
> >> >>>>>>> Just my 2 cents.
> >> >>>>>>>
> >> >>>>>>> Best,
> >> >>>>>>> Angelos
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>> Angelos Tzotsos, PhD
> >> >>>>>>> OSGeo Charter Member
> >> >>>>>>> http://users.ntua.gr/tzotsos
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> _______________________________________________
> >> >>>>>>> ckan-dev mailing list
> >> >>>>>>> ckan-dev at lists.okfn.org
> >> >>>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >> >>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> _______________________________________________
> >> >>>>>> ckan-dev mailing
> >> >>>>>> listckan-dev at lists.okfn.orghttps://
> lists.okfn.org/mailman/listinfo/ckan-dev
> >> >>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> _______________________________________________
> >> >>>>>> ckan-dev mailing list
> >> >>>>>> ckan-dev at lists.okfn.org
> >> >>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >> >>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >> >>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>> --
> >> >>>> *STEVEN DE COSTA *|
> >> >>>> *EXECUTIVE DIRECTOR*www.linkdigital.com.au
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> _______________________________________________
> >> >>>> ckan-dev mailing list
> >> >>>> ckan-dev at lists.okfn.org
> >> >>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >> >>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >>
> >> >> --
> >> >> Karissa McKelvey
> >> >> http://karissa.github.io/ <http://karissamck.com>
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Karissa McKelvey
> >> > http://karissa.github.io/
> >> >
> >> > _______________________________________________
> >> > ckan-dev mailing list
> >> > ckan-dev at lists.okfn.org
> >> > https://lists.okfn.org/mailman/listinfo/ckan-dev
> >> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >> >
> >> >
> >> -------------- next part --------------
> >> An HTML attachment was scrubbed...
> >> URL:
> >> <
> http://lists.okfn.org/pipermail/ckan-dev/attachments/20150915/68e58c70/attachment.html
> >
> >>
> >> ------------------------------
> >>
> >> Subject: Digest Footer
> >>
> >> _______________________________________________
> >> ckan-dev mailing list
> >> ckan-dev at lists.okfn.org
> >> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >>
> >>
> >> ------------------------------
> >>
> >> End of ckan-dev Digest, Vol 59, Issue 33
> >> ****************************************
> >
> >
> >
> > _______________________________________________
> > ckan-dev mailing list
> > ckan-dev at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/ckan-dev
> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >
>
>
>
> --
> Karissa McKelvey
> http://karissa.github.io/
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 14 Sep 2015 20:39:03 -0700
> From: Karissa McKelvey <karissa.mckelvey at gmail.com>
> To: CKAN Development Discussions <ckan-dev at lists.okfn.org>
> Subject: Re: [ckan-dev] Future, flask, breaking things, funding.
> Message-ID:
>         <
> CAMmgt+y+6B2BWgsdCE6rWr3mQVpKrSkXztPXxcb29jAfL05xVg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hey Steven! Yeah, let's get together. I'll be around those days.
>
> On Mon, Sep 14, 2015 at 7:20 PM, Steven De Costa <
> steven.decosta at linkdigital.com.au> wrote:
>
> > I'll be in San Francisco 4-6 October if you wanted to catch up and look
> at
> > it together Karissa?
> >
> > I also have some thoughts about remaining flexible in the storage types
> > that CKAN might support. Basically, it would be nice if these were
> > abstracted into an API and created via the admin as provisioning
> requests.
> > This would allow a platform to provision a variety of storage options and
> > enable them at a resource level similar to the resource views at the UI
> > level. It would also allow for network level security models to be
> > employed, or data storage sovereignty to be maintained in accordance to
> to
> > jurisdictional or security classification. Maybe we could call these
> > resource containers?
> >
> > Happy to catch up with anyone in SF re CKAN :) In fact, happy to run a
> > meetup there if there is interest... physical + video conference.
> >
> > I'm in Vegas for re:Invent on the 7th to 9th too :) It would be good to
> > form a huddle of CKANers at the re:Play party on the 8th!
> >
> > Cheers,
> > Steven
> >
> > *STEVEN DE COSTA *|
> > *EXECUTIVE DIRECTOR*www.linkdigital.com.au
> >
> >
> >
> > On 15 September 2015 at 08:41, Karissa McKelvey <
> > karissa.mckelvey at gmail.com> wrote:
> >
> >> I think Dat would be a great way to allow programmatic access to
> datasets
> >> in CKAN. Dat handles streaming data very well. I imagine being able to
> >> replace the `.csv` with a `.dat` and get streaming and incremental
> uploads
> >> and downloads.
> >>
> >> Dat has a two-phase sync process, the first computes the differences
> >> between the local and remote copy, and the second syncs the data that is
> >> different. This leads to users never having to download the same data
> >> twice, and reduces bandwidth costs for the host. Because dat knows the
> >> differences between each data version, it is also a really lightweight
> way
> >> to see and overview of previous data versions for a single dataset.
> >>
> >> I'd be happy to chat more about how this might work in practice!
> >>
> >> Cheers,
> >>
> >> On Mon, Sep 14, 2015 at 3:37 PM, Karissa McKelvey <
> >> karissa.mckelvey at gmail.com> wrote:
> >>
> >>> I think Dat would be a great way to allow programmatic access to
> >>> datasets in CKAN. Dat handles streaming data very well. I imagine being
> >>> able to replace the `.csv` with a `.dat` and get streaming and
> incremental
> >>> uploads and downloads.
> >>>
> >>> Dat has a two-phase sync process, the first computes the differences
> >>> between the local and remote copy, and the second syncs the data that
> is
> >>> different. This leads to users never having to download the same data
> >>> twice, and reduces bandwidth costs for the host. Because dat knows the
> >>> differences between each data version, it is also a really lightweight
> way
> >>> to see and overview of previous data versions for a single dataset.
> >>>
> >>> I'd be happy to chat more about how this might work in practice!
> >>>
> >>> Cheers,
> >>>
> >>>
> >>> On Mon, Sep 14, 2015 at 2:51 PM, Joel Natividad <
> >>> joel.natividad at ontodia.com> wrote:
> >>>
> >>>> Hi all,
> >>>> What about integrating with Dat <http://dat-data.com>?
> >>>>
> >>>> It handles streaming data; can handle huge datasets; can do deltas (no
> >>>> need to re-download a huge dataset over and over again) ; has
> versions (not
> >>>> just revisions as data consumers have legitimate reasons to use
> different
> >>>> versions of data, down to the row level), and makes CKAN more
> "dog-fooding"
> >>>> friendly (i.e. publishers using it not only to publish data, but to
> >>>> actually build solutions ).
> >>>>
> >>>> Marianne Bellotti (CKAN-powered HDX) and I independently spent some
> >>>> quality time with Karissa McKelvey - one of the three key developers
> >>>> behind Dat <http://dat-data.com/team>, when she was in NYC last month
> >>>> and discussed at length how Dat + CKAN can work together.
> >>>>
> >>>> Karissa even put together a rough spec on a "ckanext-dat" extension.
> >>>>
> >>>> FYI, Dat is supported by usopendata.org
> >>>> <https://usopendata.org/2015/07/29/dat-beta/>, which also happens to
> >>>> be the org behind CKAN-Multisite, which was just announced as
> >>>> generally available today.
> >>>> <https://usopendata.org/2015/09/14/ckan-multisite/>
> >>>>
> >>>> Best,
> >>>> Joel
> >>>>
> >>>>
> >>>> --
> >>>> Joel Natividad
> >>>> +1 347-565-5635
> >>>> @jqnatividad
> >>>>
> >>>> Ontodia, Inc.
> >>>> 137 Varick Street, 2nd Floor, New York, NY 10013
> >>>>
> >>>> On Mon, Sep 14, 2015 at 5:11 PM, Steven De Costa <
> >>>> steven.decosta at linkdigital.com.au> wrote:
> >>>>
> >>>>> I'm 'all in' on this discussion :) I'll setup a doodle and we can
> pick
> >>>>> a time to do a video call...
> >>>>>
> >>>>> My 2c on some points.
> >>>>>
> >>>>> 1. Perhaps redev could be bottom up. Start with resources and widen
> >>>>> its ability. Crud can then be rebuilt over the top.
> >>>>> 2. Carefully consider the longest term possible and how the app may
> >>>>> mature in the future.
> >>>>> 3. Consider interoperability between n+1 platforms via linked open
> >>>>> data, again with realtime in mind
> >>>>> 4. Consider packages further. Could we add new package types that are
> >>>>> built on 3.0 thinking and have them co exist with current packages?
> If so
> >>>>> then existing extensions could be modified less dramatically to
> apply only
> >>>>> to v2 packages.
> >>>>> 5. Think about migration scenarios. Could a v2 CKAN remain as a dumb
> >>>>> web app harvesting from a 3.0? If so, we could priorities workflows
> around
> >>>>> custodians and ETL before end users.
> >>>>> 6. Yes I'm sure others in the steering group would support the work.
> >>>>> Just remember they are also just volunteers :)
> >>>>> 7. Yes I'm sure funding could come from the Association, just so long
> >>>>> as funding first goes into the association. So, we'd all have a part
> to
> >>>>> play in signing up paying members - happy to take any leads from
> people on
> >>>>> that point :)
> >>>>>
> >>>>> Hoots!
> >>>>>
> >>>>>
> >>>>> On Tuesday, September 15, 2015, Denis Zgonjanin <
> >>>>> deniszgonjanin at gmail.com> wrote:
> >>>>>
> >>>>>> Yes, we should think of use cases. Realtime data is just one. I'm
> not
> >>>>>> just talking about things we might want to do. Here are the current
> things
> >>>>>> in CKAN that would benefit from better asynchronous support:
> >>>>>>
> >>>>>> - Datastore & Datapusher. We could integrate datapusher into CKAN,
> so
> >>>>>> people don't need to set up an additional web service just to use
> stock
> >>>>>> CKAN.
> >>>>>> - Harvesting. Set up a periodic callback that calls harvest sources
> >>>>>> every hour. Super easy when compared to having to set up
> reddit/ZeroMQ, and
> >>>>>> another 3(!) long-running processes running in the background.
> >>>>>> - Webhooks. They must be pushed off to a celery queue because of
> >>>>>> Pylons. With async they could be fired off easily.
> >>>>>> - Analytics & analytics reports; Sending automated emails and other
> >>>>>> automated tasks.
> >>>>>> - Anything where right now we have to set up cron jobs.
> >>>>>>
> >>>>>> And probably most importantly - CKAN is going to need a face lift
> >>>>>> eventually if it's to remain relevant. It can't be stuck in CRUD
> land
> >>>>>> forever. There is plenty of time for this, no rush. But building
> cool
> >>>>>> shinny new things with fancy front-end javascript would be hard
> right now.
> >>>>>> It will be hard on any web framework built on the idea that your
> whole
> >>>>>> application context is transferred to the user on every HTTP
> request, and
> >>>>>> that nothing else except that is going on in the backend.
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Sep 14, 2015 at 9:34 AM, St?phane Guidoin <
> >>>>>> stephane.guidoin at gmail.com> wrote:
> >>>>>>
> >>>>>>> *Now that government is (slowly) catching on, more stream, API, and
> >>>>>>> even real-time data is being published. CKAN doesn't do a great
> job here.
> >>>>>>> The biggest obstacle to creating nice extensions to CKAN for
> non-file data
> >>>>>>> is that Pylons is still firmly stuck within the HTTP
> request-response
> >>>>>>> lifecycle. *
> >>>>>>>
> >>>>>>> I wonder what should be the role of CKAN when it comes to APIs,
> >>>>>>> streams and other things. Those stuff tend to be fairly resource
> intensive
> >>>>>>> and most of the time, they are developed and hosted on their own,
> not on
> >>>>>>> the open data portal. So what should be the role of CKAN on this?
> How much
> >>>>>>> do we want to be able to integrate CKAN with APIs and streams,
> what should
> >>>>>>> it give?
> >>>>>>>
> >>>>>>> From my point of view, moving to Flask or other, framework is
> mostly
> >>>>>>> a question of technical debt (
> >>>>>>> https://18f.gsa.gov/2015/08/07/technical-debt-1/) and making sure
> >>>>>>> CKAN remains flexible (and build-in async would indeed help)
> >>>>>>>
> >>>>>>> When it comes to see how to support realtime data, even if it's to
> >>>>>>> mainly enable extension development, some thinking about use case
> is needed
> >>>>>>> in order to avoid jumping into something that would be very time
> intensive
> >>>>>>> in terms of dev.
> >>>>>>>
> >>>>>>> St?phane
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2015-09-14 08:57, Denis Zgonjanin wrote:
> >>>>>>>
> >>>>>>> Right now CKAN is great for static sources of data, which is really
> >>>>>>> all that existed from government sources when CKAN was first
> written.
> >>>>>>>
> >>>>>>> Now that government is (slowly) catching on, more stream, API, and
> >>>>>>> even real-time data is being published. CKAN doesn't do a great
> job here.
> >>>>>>> The biggest obstacle to creating nice extensions to CKAN for
> non-file data
> >>>>>>> is that Pylons is still firmly stuck within the HTTP
> request-response
> >>>>>>> lifecycle.
> >>>>>>>
> >>>>>>> This worked well for CRUD apps, but now is really showing it's
> >>>>>>> limitations. It's hard to do anything in CKAN that doesn't take
> place
> >>>>>>> within the context of a user's HTTP request. If you want to do
> some extra
> >>>>>>> data processing on the side, you have to use celery queues or
> worse, cron.
> >>>>>>> Worse yet, some people do try to put extra processing inside the
> >>>>>>> request-response lifecycle, causing problems.
> >>>>>>>
> >>>>>>> Even core CKAN is guilty of this. For example, CKAN will call
> >>>>>>> datapusher to send upload jobs and retrieve job results, and those
> requests
> >>>>>>> to datapusher happen while the user is waiting for the request to
> return.
> >>>>>>> This is kind of terrible. Not even because somebody did it this
> way, but
> >>>>>>> because CKAN doesn't give you a sane alternative to do it properly.
> >>>>>>>
> >>>>>>> Porting CKAN to flask is no small feat, so let's make sure we do it
> >>>>>>> right. Now that we're not using CKAN to just host static files
> anymore, we
> >>>>>>> need to have better, built-in async support in CKAN. Perhaps this
> means
> >>>>>>> moving to Python 3 where we'll have asyncio (and hopefully a
> future version
> >>>>>>> of flask will work well with it). Other frameworks, like tornado,
> are also
> >>>>>>> quite lightweight and support this out of the box for python 2.x.
> >>>>>>>
> >>>>>>> - Denis
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Sep 14, 2015 at 3:56 AM, Angelos Tzotsos <
> >>>>>>> gcpp.kalxas at gmail.com> wrote:
> >>>>>>>
> >>>>>>>> On 09/14/2015 10:24 AM, Ross Jones wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I?ve recently been playing about with implementing parts of CKAN
> >>>>>>>>> in Flask side-by-side with the current Pylons implementation.
> I?m doing it
> >>>>>>>>> like this so that it isn?t immediately obvious that there?s a
> migration
> >>>>>>>>> happening towards using Flask (aka nothing breaks).  I don?t
> think this
> >>>>>>>>> branch should ever be merged, it?s more exploratory but it has
> raised some
> >>>>>>>>> questions that I think it would be good to discuss.
> >>>>>>>>>
> >>>>>>>>> WARNING:anecdata
> >>>>>>>>> It?s pretty clear that the vast majority of people asked would
> >>>>>>>>> like to move to Flask as a replacement for some layers of the
> system
> >>>>>>>>> (leaving things like logic and plugins alone).
> >>>>>>>>> ENDWARNING
> >>>>>>>>>
> >>>>>>>>> We?ve discussed at the tech-team meetings, but I think a longer,
> >>>>>>>>> more accessible conversation would be beneficial.
> >>>>>>>>>
> >>>>>>>>> 1. What version of CKAN should be targeted? Common sense suggests
> >>>>>>>>> 3.0, but that being the case, exactly how far can we go in
> breaking some
> >>>>>>>>> backward compatibility?  This isn?t really a technical question
> - would be
> >>>>>>>>> good to hear what the community would accept ?
> >>>>>>>>>
> >>>>>>>>> 2. Does it *really* need to be side-by-side?  Running Flask and
> >>>>>>>>> Pylons side-by-side means staying on Python 2 for another few
> years
> >>>>>>>>> (because Pylons).  A reasonably deep incision and removal of
> >>>>>>>>> non-logic/non-plugin code would make a move to Py3 easier, but
> with some
> >>>>>>>>> level of breakage in external plugins. Staying on 2 would mean a
> move to 3
> >>>>>>>>> at a later date and more pain.
> >>>>>>>>>
> >>>>>>>>> 3. Would the CKAN Association like to fund someone to do some of
> >>>>>>>>> this work? This is just one of several ideas mentioned on
> >>>>>>>>> https://github.com/ckan/ideas-and-roadmap/issues/152 that really
> >>>>>>>>> needs to be done if CKAN is going to thrive instead of just
> survive.
> >>>>>>>>>
> >>>>>>>>> Any feedback welcome?
> >>>>>>>>>
> >>>>>>>>> Cheers
> >>>>>>>>>
> >>>>>>>>> Ross.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> ckan-dev mailing list
> >>>>>>>>> ckan-dev at lists.okfn.org
> >>>>>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >>>>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi Ross,
> >>>>>>>>
> >>>>>>>> I believe that a Flask port (or rewrite) is an excellent idea for
> >>>>>>>> CKAN 3.0 in order to support Python 3.x
> >>>>>>>> The alternative would be to port Pylons to Python 3.x, which
> >>>>>>>> perhaps is a more difficult task...
> >>>>>>>>
> >>>>>>>> Given that Python 2.x will EOL relatively soon, CKAN should move
> >>>>>>>> forward.
> >>>>>>>>
> >>>>>>>> Just my 2 cents.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Angelos
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Angelos Tzotsos, PhD
> >>>>>>>> OSGeo Charter Member
> >>>>>>>> http://users.ntua.gr/tzotsos
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> ckan-dev mailing list
> >>>>>>>> ckan-dev at lists.okfn.org
> >>>>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >>>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> ckan-dev mailing listckan-dev at lists.okfn.orghttps://
> lists.okfn.org/mailman/listinfo/ckan-dev
> >>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> ckan-dev mailing list
> >>>>>>> ckan-dev at lists.okfn.org
> >>>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> *STEVEN DE COSTA *|
> >>>>> *EXECUTIVE DIRECTOR*www.linkdigital.com.au
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> ckan-dev mailing list
> >>>>> ckan-dev at lists.okfn.org
> >>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Karissa McKelvey
> >>> http://karissa.github.io/ <http://karissamck.com>
> >>>
> >>>
> >>
> >>
> >> --
> >> Karissa McKelvey
> >> http://karissa.github.io/
> >>
> >> _______________________________________________
> >> ckan-dev mailing list
> >> ckan-dev at lists.okfn.org
> >> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >>
> >>
> >
> > _______________________________________________
> > ckan-dev mailing list
> > ckan-dev at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/ckan-dev
> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >
> >
>
>
> --
> Karissa McKelvey
> http://karissa.github.io/
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.okfn.org/pipermail/ckan-dev/attachments/20150914/a7365ebc/attachment.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
> ------------------------------
>
> End of ckan-dev Digest, Vol 59, Issue 35
> ****************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150914/916f9ce4/attachment-0003.html>


More information about the ckan-dev mailing list