[ckan-dev] [ckan-discuss] Test speed, sqlite and sqlalchemy

Seb Bacon seb.bacon at gmail.com
Thu Dec 30 14:09:47 UTC 2010


Hi,

Again with reference to below, have pushed further changes to
https://bitbucket.org/sebbacon/ckan.
Have cc'd ckan-dev for general dev interest and/or feedback!

On 30 December 2010 03:02, David Raznick <kindly at gmail.com> wrote:
> Hello
> I am emailing off the list as I think it would be too detailed for it.
>
>>
>>  I believe that David's objection to supporting sqlite as an
>> option is that it "unnecessarily adds complication to production code"
>> (David, is that right?), but I don't believe it does. I had to change
>> quite a lot of the *tests* because they assumed persistent fixtures,
>> but none of the main codebase.
>
>
> You could be right,  I expected more conversion errors for date, unicode and numeric types.  If you can get round these by changing the tests like harvesting unicode conversions then great. The extra speed may be worth it.

FYI I've seen no errors relating to date or numeric types at all.

>> With reference to the below, I've pushed all the work I've done so far
>> to https://bitbucket.org/sebbacon/ckan.
>
>
> I have done a little code review, as we have covered similar ground.   I have used a codereview tool.  Its a nice little app to stop the email from getting too much code in it.  The main link is http://codereview.appspot.com/3772048/ , however I will post the individual links.

That tool is very nice.

> **WARNING the diffs are in the wrong direction, the changes are on the left or are marked "-" in diffs.  Also click on the comments to see them properly**
> I will just do a list of points vaguely in order of importance.  I understand you are mid work on this but thought it might be some help.

Indeed -- two brains are much better than one :)  Especially as you
are clearly far better acquainted with sqlalchemy than me.  Shame we
weren't collaborating more earlier, but at least we know for next
time.

>
> * I seriously think that we should *not* change the models.  I think the unique constraints should stay. If we do remove them we will need a migration script.  Removing them causes lots of tests to fail.  e.g http://codereview.appspot.com/3772048/patch/1/9

Doesn't cause any tests to fail for me... can you specify?  I had
already created a migration script which I'd forgotten to commit (now
in there).

>http://codereview.appspot.com/3772048/diff/1/ckan/model/domain_object.py.  Postgres is slow at doing anything to do with table creation, dropping and inspecting.  The profiler says the  by_name function is run over 20,000 times during the tests.  This causes at least a 100 seconds slowdown in postgres.  The original code did not need to be changed.   As a side note, the calls to init_db also slow postgres down due to the repeated metadata.create_all(engine) calls.  Both these cause some of the discrepancy between our postgres timings.  My posgres takes about 15 mins with your branch.

OK, have reverted that change.  It was one of the earlier changes I
introduced to fix test failures -- I imagine it was one that has
subsequently been fixed by overriding the autoflush madness.  Either
way, the tests now pass without it.

Re init_db, I am not sure what the alternative to calling it in the
setup of most tests, if we want tests to run independently of each
other.

> * The new indexing stuff does not appear to work for me yet. I think you just have not added the new index.txt file. http://codereview.appspot.com/3772048/diff/1/ckan/model/__init__.py

Yes, this was unfinished.  The reason I paused here was because I was
getting a very odd "index already exists" error (odd in that it
appeared to be an exception thrown from the "self.metadata.create_all"
call in create_db).  I just rewrote it a little more cleanly and
uncommented the sample index in indexes.txt; try rerunning the tests
and let me know if it's obvious to you why it's failing.

Incidentally, I believe that the vast majority of the indexes that
were being created by 021_postres_upgrade.sql were redundant as they
were primary or foreign keys, so they should already *be* indexed.

> * The patch I sent was supposed to use delete not drop.  See comments in http://codereview.appspot.com/3772048/diff/1/ckan/model/__init__.py.

The reason I changed this to 'drop' was to accurately replace the
semantics of the clean_db method we're overriding in vdm's tools.py
(which does a metadata.drop_all())

On the other hand, I suppose a drop is much slower than a delete all,
and this is test-specific code, so perhaps I shouldn't worry about
that.  My concern is that any other tests which might be added in the
future may depend on the "drop" semantics of vdm's clean_db().

>http://codereview.appspot.com/3772048/patch/1/2.  The attached patch authz.patch  fixes your only major error.  The authz query is wrong in new versions of sqlalchemy. You are no longer allowed to use the parent table model.UserObjectRole in the filter clause as it results in a cross join and returns ALL the rolls!.  You have to use UserRole or GroupRole etc.  There is a way to get out these classes from the entity you put in as shown in the patch.

Great - I already knew about this error as noted in the tests but
wasn't sure of the right way to fix it.  Thanks!

>
>http://codereview.appspot.com/3772048/diff/1/ckan/controllers/package.py http://codereview.appspot.com/3772048/diff/1/ckan/lib/package_saver.py.    I solved the detached from session errors by adding a few Session().autoflush = False.   I am not sure your other fixes in these files are needed.
> The new version of sqlalchemy more aggressively autoflush. It does it when you try looping through a one to many collection i.e pkg.groups.   I personally hate autoflush.  Attached in autoflush.diff are where I put the stop autoflushes to stop the errors.

I hate autoflush too :)  It's horribly magical behaviour.

It doesn't fix the need to do the pkg.groups hack, though; without it
the test at functional.test_package.test_edit_bad_log_message still
fails.

>
> * the alphabet paging hack can be done properly in 0.6  http://codereview.appspot.com/3772048/patch/1/4.  by using http://www.sqlalchemy.org/docs/orm/query.html#sqlalchemy.orm.query.Query.column_descriptions

Thanks.  So I've tried a solution based on column_descriptions, but it
still looks hacky.  You'll see what I mean at
https://bitbucket.org/sebbacon/ckan/changeset/98198a6b0879#chg-ckan/lib/alphabet_paginate.py

Am I still missing a more elegant way of getting the necessary
entities via the API?

> * Attached is less_polling.diff which means you do not have to wait for a whole second each request.

Great.

The only other error left when running the test suite against
sqlalchemy 0.6.5 (in addition to the duplicate index issue mentioned
above) is the "stale association proxy" error that is commented in
functional.test_package.test_calculate_etag_hash.  I have to admit
that as a sqlalchemy newbie I'm rather baffled by it -- any ideas?

Cheers,

Seb




More information about the ckan-dev mailing list