[ckan-dev] API Performance

Ross Jones ross at servercode.co.uk
Tue Jun 25 10:54:43 UTC 2013


On 25 Jun 2013, at 11:15, Toby Dacre <toby.okfn at gmail.com> wrote:
> This does sound like something we should be looking at.  Maybe it is a
> paging issue as we shouldn't be returning so much stuff that the in
> memory stuff is an issue.  Are there particular api calls that are
> really bad?

search/revision is particularly bad if you have tens, or hundreds of thousands of revisions, but I think most using the controller could benefit. We've patched ours to have a limit, but it might be nicer to rebuild the API to have both soft (param) or hard(config) limits? And obviously a way of denoting that there are more to be fetched.

> Much of the performance problem as I see it is that often we get data
> and then throw it away eg getting a list of items will get the whole
> items and then do `return [x['id'] for x in items]`

Yup, that and building a few hundred meg of dicts in ram, before json encoding it and then concat-ing it with the jsonp wrapper ;)  For that particular call, as well as adding a limit, I stopped using _finish_ok and returned a generator, which itself was using the sqlalchemy yield_per. It was significantly faster and used a lot less memory. There's still room for improvement.

Am sure you've better/other ideas on how to fix it, but I just wanted to check it was on the radar.

Thanks.

Ross





More information about the ckan-dev mailing list