[ckan-dev] Perfomance work, review requested

Toby Dacre toby.okfn at gmail.com
Thu Jul 4 09:12:50 UTC 2013


On 3 July 2013 22:41, Ian Ward <ian at excess.org> wrote:
> I would like to know how likely some changes are to being accepted
> into CKAN. I'd also like some help from people that know the code
> better in resolving some of the test failures.
>
> These changes are related to leaning on SOLR a little more for better
> performance of package_show: http://github.com/okfn/ckan/pull/1079
>
> 53a4e5c (1.9x faster): use the data_dict in solr instead of dictizing
> the models from the DB, when possible. I might not have the "when
> possible" part correct here.
>
> 973bb8c (4.9x faster): store the package_show-schema validated version
> in SOLR data_dict to reduce the work when calling package_show. This
> moves some work to the when packages are updated and created, but I
> expect that this penalty can be removed because we probably have
> already just generated a validated version of the package (no
> optimization has been done here yet).
>
> f2a4822 (8x faster): allow actions to return a json string instead of
> decoded json data and pass that directly to the caller, skipping the
> work decoding json just to re-encode it on the other end. This might
> not be the best implementation, but it does offer an extra 60%
> improvement, and could be useful for other API calls too.
>

The main thing is backward compatibility to not break things.  Your PR
1079 seems to do this - I've only had a quick skim.  One thing I might
change is `json_string` and use `return_type` as that feels like it
has more flexibility but others may disagree there


>
> One change to package_list is awating review:
> https://github.com/okfn/ckan/pull/1042
>
> The commits currently on the PR (10x faster) have been merged
>
> The remaining change
> https://github.com/okfn/ckan/pull/1042#issuecomment-20067963 (18x
> faster) involves using raw SQL for the query. I'm happy to put this
> code as a method on the model, but the discussion side-tracked onto
> what sort of parameters that method should have. Would a patch that
> has no parameters and just returns all the ids be acceptable for now?
> I'd like to have the parameter discussion, but that seems like a
> separate thing.
>

the two things are independent so can be done separately imho

>
> Getting some feedback would be really helpful. I'm going to be looking
> at a number of other calls that are really slow on our system, and I'd
> like to know what sorts of things I can get away with :-)

I think the main thing is to make small pull requests as these are
quicker/easier for us to review and merge and less likely to get stuck
in the queue.  Stats showing improvements are really helpful too.

I think anything that helps speed up the ckan beast will be
appreciated so long as it does not over-complicate the code.

The tests can be a pain irc is probably the best place to request help with them

Thanks

>
> Ian
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev



-- 
Toby Dacre

The Open Knowledge Foundation

Empowering through Open Knowledge
http://okfn.org/  |  @okfn




More information about the ckan-dev mailing list