[ckan-dev] Perfomance work, review requested

David Raznick david.raznick at okfn.org
Thu Jul 4 14:57:39 UTC 2013


On Wed, Jul 3, 2013 at 10:41 PM, Ian Ward <ian at excess.org> wrote:

> I would like to know how likely some changes are to being accepted
> into CKAN. I'd also like some help from people that know the code
> better in resolving some of the test failures.
>
> These changes are related to leaning on SOLR a little more for better
> performance of package_show: http://github.com/okfn/ckan/pull/1079
>
> 53a4e5c (1.9x faster): use the data_dict in solr instead of dictizing
> the models from the DB, when possible. I might not have the "when
> possible" part correct here.
>

This is great and this change has been considered for a while.   Leaning on
solr for speed I think is really good.

My main concern with not doing this already has not been covered by your
pull request and may require a little thought:

There are options to make solr commit asynchronously when saving a dataset
i.e not waiting for the commit to happen in the same request that updated
the dataset.  So by making solr the source of the data_dict, when
displaying the dataset, just after saving it, it will show the old
version.  This will be very confusing for the user.   We have that problem
now, but it just takes a while to get on the search listings, which is less
of a bother (as people will not immediately go and search for it).  Also I
have general fears of making solr the canonical source of the data_dict
especially for editing, just on the very off chance they are out of sync.
Nonetheless these probably have workarounds, by probably have a context
option to say when it is appropriate to use solr and when it is better to
use the db.

This also plays badly with the before_view extension point (which
admittedly is a bad one).  If you look at the package search is does this
after receiving the raw data_dict from the search index:

https://github.com/okfn/ckan/blob/master/ckan/logic/action/get.py#L1374






>
> 973bb8c (4.9x faster): store the package_show-schema validated version
> in SOLR data_dict to reduce the work when calling package_show. This
> moves some work to the when packages are updated and created, but I
> expect that this penalty can be removed because we probably have
> already just generated a validated version of the package (no
> optimization has been done here yet).
>

I imagine the difference will not be so big for less customized schemas.

I can not get my head round how this effects the before_view extension
point but I am sure it could break it in certain circumstances.  Not too
worried about the breakage though.

There are also could be issues with some extensions using the
validate=False config flag and I do not think this honoured by this pull
request.

I would be more inclined to have a copy of both validated and unvalidated
data_dicts in solr which would make this possible.  (not too worried about
space issues)


>
>
> f2a4822 (8x faster): allow actions to return a json string instead of
> decoded json data and pass that directly to the caller, skipping the
> work decoding json just to re-encode it on the other end. This might
> not be the best implementation, but it does offer an extra 60%
> improvement, and could be useful for other API calls too.
>
> If anyone uses before_view, this breaks it definitely.  Do not like the
way this implemented either and the placeholder should be a bit uglier and
longer at least.


>
> One change to package_list is awating review:
> https://github.com/okfn/ckan/pull/1042
>
> The commits currently on the PR (10x faster) have been merged
>
> The remaining change
> https://github.com/okfn/ckan/pull/1042#issuecomment-20067963 (18x
> faster) involves using raw SQL for the query. I'm happy to put this
> code as a method on the model, but the discussion side-tracked onto
> what sort of parameters that method should have. Would a patch that
> has no parameters and just returns all the ids be acceptable for now?
> I'd like to have the parameter discussion, but that seems like a
> separate thing.
>
>
> Getting some feedback would be really helpful. I'm going to be looking
> at a number of other calls that are really slow on our system, and I'd
> like to know what sorts of things I can get away with :-)
>
> Ian
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20130704/e4f69586/attachment-0001.html>


More information about the ckan-dev mailing list