[ckan-dev] Antwort: Re: Caching

Ian Ward ian at excess.org
Wed Mar 30 17:23:43 UTC 2016


On Wed, Mar 30, 2016 at 12:52 PM,  <Florian.Brucker at mb.karlsruhe.de> wrote:
> Thanks for your input, Ian!
>
> Actually I'm not sure whether this is an oversight. Consider the following
> example:
>
> 1. I requeset "dataset/my_dataset". For the response,
> "templates/package/read.html" is rendered, using the package data supplied
> via the "extra_vars" argument to "lib.base.render"
> 2. The dataset "my_dataset" changes (e.g. someone updates the description)
> 3. I request "dataset/my_dataset" again.
>
> Now if we simply allow caching in this case then I will not get an updated
> version in the final step, because my requests do not differ (but the
> content of the extra_vars does). So it is, in general, the right thing to
> not cache these pages naively.

Of course. You're right.

> However, this means that we only allow caching for static files. To also
> allow caching of dynamic files we need a way to tell whether the cached
> version uses the same extra_vars values as we would use in a new render. One
> possibility to do this is to hash the extra_vars and use the ETag HTTP
> header. However, this requires us to implement the necessary machinery
> inside CKAN (because this cannot be done in a reverse proxy like Nginx).
> This doesn't reduce the number of requests (because the client has to ask if
> his version is up to date) or the time to look up the necessary information
> in the database but it reduces bandwith and template rendering time if you
> hit the cache (because you can then simply return HTTP 304 Not Modified
> instead of a full response). A priori I cannot tell how much one would gain
> from such an approach but it might be worth a try.

I can think of a number of pages that wouldn't work well with this
approach. For example the dashboard page does most of its work before
rendering the response, so we won't be saving much server time.

We should be able to progressively add ETag support to pages that need
it while minimizing the server load. The dataset page could generate
an ETag based on the dataset last modified date and user's permission
for that dataset (whether or not to render the edit button) We'll have
to address each page carefully, and figure out a way to handle the
activity notification widget (some extra JS instead of static
content?).

> Another possibility is to allow naive caching of dynamic pages but only with
> a very short expiration time (i.e. a few seconds). This is often called
> micro-caching and helps if you have a lot of simultaneous requests. In that
> case one would probably have two different expiration settings, one for
> static and one for dynamic content.

That's probably worth doing, but a few seconds might be too long if
someone is editing a single value and re-saving.

> I've just started to dive into CKAN's caching system so I might be missing a
> lot of stuff. I also don't know about previous or ongoing discussions
> regarding this topic, so feel free to point me towards more information.

I'm not aware of recent caching discussion, so this might be one area
of CKAN that could really use some new contributions.

>
>
> Regards,
> Florian
>
> "ckan-dev" <ckan-dev-bounces at lists.okfn.org> schrieb am 30.03.2016 17:25:16:
>
>> Von: Ian Ward <ian at excess.org>
>> An: CKAN Development Discussions <ckan-dev at lists.okfn.org>,
>> Datum: 30.03.2016 17:26
>> Betreff: Re: [ckan-dev] Caching
>> Gesendet von: "ckan-dev" <ckan-dev-bounces at lists.okfn.org>
>>
>> I'm not sure why caching is disabled when variables are passed to the
>> templates. Seems like an oversight. Would you like to submit a patch
>> to remove that check?
>>
>> On Wed, Mar 30, 2016 at 6:18 AM,  <Florian.Brucker at mb.karlsruhe.de> wrote:
>> > I did some further research in the caching logic in `render` in
>> > `lib/base.py`. It seems that in my case caching is always disabled due
>> > to
>> > the `extra_vars` being passed via parameter. These extra variables
>> > contain
>> > the data to be filled into the templates (e.g. the package information
>> > when
>> > displaying a dataset), and hence a change in these variables would
>> > invalidate a cached version of the template for the same URL (if we
>> > would
>> > allow caching in that case). From my research this kind of content-based
>> > cache invalidation (e.g. using an ETag-header) seems hard to do on the
>> > Nginx-level and should rather be done in the application (i.e. CKAN)
>> > itself.
>> > Although there seems to be some kind of page-caching built into CKAN
>> > (cf.
>> > the `ckan.page_cache_enabled` configuration option) it seems to respect
>> > the
>> > CKAN_PAGE_CACHABLE setting set by `lib.base.render`, which, as
>> > discussed,
>> > won't allow caching of pages when extra variables are set.
>> >
>> > So I guess my questions boils down to the following:
>> >
>> > - Does CKAN support caching of rendered templates? If not, are there any
>> > plans to add such a feature?
>> >
>> > - What benefit do I have from using Nginx in front of Apache + CKAN if
>> > Nginx
>> > will only cache static content?
>> >
>> >
>> > Regards,
>> > Florian
>> >
>> >
>
>> > "ckan-dev" <ckan-dev-bounces at lists.okfn.org> schrieb am
>> > 29.03.201617:20:56:
>> >
>> >> Von: Florian.Brucker at mb.karlsruhe.de
>> >> An: ckan-dev at lists.okfn.org,
>> >> Datum: 29.03.2016 17:21
>> >> Betreff: [ckan-dev] Caching
>> >> Gesendet von: "ckan-dev" <ckan-dev-bounces at lists.okfn.org>
>> >
>> >>
>> >> Hello everybody,
>> >>
>> >> I'm using Nginx as a caching proxy in front of Apache serving our
>> >> CKAN instance, as described in the CKAN docs [0]. Most things work
>> >> fine, however I've noticed that CKAN seems to add a "Cache-Control:
>> >> private" HTTP-header to all responses except static files, even if
>> >> the request does not come from a logged-in user. I do understand why
>> >> one might want to bypass the cache for logged-in users (as it's done
>> >> in the Nginx config example in the CKAN docs [1]), but why disable
>> >> (public) caching for the generated HTML for not-logged-in requests?
>> >> In my tests without a cache, page generation took much longer then
>> >> downloading all the static content, so caching the generated HTML
>> >> would make a lot of sense in my opinion.
>> >>
>> >>
>> >> [0]
>> >> http://docs.ckan.org/en/latest/maintaining/installing/deployment.html
>> >> [1] http://docs.ckan.org/en/latest/maintaining/installing/
>> >> deployment.html#create-the-nginx-config-file
>> >>
>> >>
>> >> Regards,
>> >> Florian_______________________________________________
>> >> ckan-dev mailing list
>> >> ckan-dev at lists.okfn.org
>> >> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> >> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>> >
>> >
>> > _______________________________________________
>> > ckan-dev mailing list
>> > ckan-dev at lists.okfn.org
>> > https://lists.okfn.org/mailman/listinfo/ckan-dev
>> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>> >
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>



More information about the ckan-dev mailing list