[ckan-discuss] Test speed, sqlite and sqlalchemy

David Raznick kindly at gmail.com
Sun Dec 26 22:57:13 GMT 2010


I am new to this project, so Hello.  I am very interested in the concept
behind it and decided to see what my expertise could bring to it.  I know
database performance and sqlalchemy very well. I have even helped out a
little with the performance of sqlalchemy.

The first thing I noticed were the slow tests. I wanted to look the tests to
learn how the project worked.  I was also concerned about the old sqlalchemy
(0.4). So I decided to see what I could do...

I have managed to get the tests down to running in about 9 mins on my oldish
laptop.

This was done by:

 * a patch I submitted on bug 868, which stopped the continuing dropping and
creating of tables as this has a large overhead.  Just simply deleting
everything in all the tables is much faster.  (
http://knowledgeforge.net/ckan/trac/ticket/868)

 * turning off durability in postgres.  You do not need durability in
testing.  I even tried putting the database in a ramfs but I decided the
complication was not worth it, the speed up was not significant. (
http://www.postgresql.org/docs/9.0/static/non-durability.html) works for 8.x
too.

 * upgrading to  sqlalchemy 0.5.7  (this gives a good 15-20% speed up).

 * sorting out some really slow bits in the tests.  There where almost 3
mins of unessential sleeping/timeouts.

The trickiest bit was the upgrade to 0.5.  I have all the tests working
apart from 2 and I understand why they are not working.  They are not
critical and do not effect the speed.  This is for another discussion.

The functional subdirectory in the tests is by far the slowest (as it should
be).  Taking up 6.5 mins of the 9.

A speed per test sheet has been posted on ticket 868.

There has been some discussion on ticket 867 (
http://knowledgeforge.net/ckan/trac/ticket/876) about using and in memory
sqlite as a backend.  In those I am clearly against it.  I think the
complication it adds to the code base is not worth it.  I also do not think
that it will speed things up as much as hoped ...

I have done some profiling on the whole test suite.  Only about 110 seconds
of the 9 mins is spent waiting for the database.  So even if you managed to
get rid of all of that time (which you will not as in memory databases are
not instantaneous) you may only get a speed up of a third.

Around 50% of the time is spent in sqlalchemy.  So the bottleneck is there.
 I think upgrading to 0.6 may be as fruitful.  sqlalchemy is going under
some big speed improvements (
http://techspot.zzzeek.org/2010/12/12/a-tale-of-three-profiles/).  There may
be some benefit from not going through the orm  part for some performance
critical areas of the code.  However, we will need to analyse real user
database traffic to determine this.

As a side note, the really good news is that hardly any time is spent in
ckan code itself.

I hope you all had a good seasonal period.

David

Please get back to me on any details on the above!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20101226/6e764106/attachment.htm>


More information about the ckan-discuss mailing list