[ckan-dev] tests progress

Seb Bacon seb.bacon at gmail.com
Wed Dec 22 16:04:51 UTC 2010


Hi all,

This is a pretty long email summarising where I'm at with tests /
sqlite / sqlalchemy upgrade.  For the time-poor, the summary is:

I still need to spend another day getting all the old tests passing,
but I'm getting there.  Definitely looking good for getting a decent
test run perhaps even in under 5 mins.  In the mean time if anyone
wants to review my current changes or offer any other observations,
that would be great.

----

I decided to start on CKAN by making the tests run faster, rather
than the 41m it takes on my laptop.  It's a good way of learning
about the code, apart from anything else.

Looking at the tests profiles, it was clear that 99% of the time was
spent in database I/O (including the overhead of setting up
connections), so the biggest win appeared to be moving the tests to a
SQLite in-memory database.

In order to do this, I quickly found out I needed to upgrade the
venerable SQLAlchemy packaged with CKAN, from 0.4.8 to 0.6.5.

So the task became two parts: upgrade SQLAlchemy, and get the tests
running against a SQLite in-memory database.

Rufus sensibly cautioned that these would be best attempted
separately; however, despite my best efforts, this turned out to be
practically impossible.  The SQLAlchemy upgrade broke a number of
tests, and the run-tests fix-tests cycle was just too painful.  So the
approach I've taken is: fix the tests with the sqlite in-memory
database, *then* try running and fixing them against postgres and the
upgraded SQLAlchemy.

It's proved much harder than I expected, partly because of the sheer
number of tests that need to be modified in minor ways, and partly
because of a couple of particularly gnarly bugs.  The work still isn't
finished but this is an update on progress.

Currently, the code lives in a branch at

   https://bitbucket.org/sebbacon/ckan/src/9da772927de6

To try it, you will need to alter your test.ini:

 [app:main]
 sqlalchemy.url = sqlite:///

You'll also need to apply sqlalchemy.patch (in the root of the
software checkout) to the installed sqlalchemy.

Summary of changes
------------------

1) Assumption of persistent fixtures

In many cases, the tests only work because in previous tests, various
fixtures had been set up.  Obviously, when running against an
in-memory database, this becomes a problem.  It's a problem anyway if
you want to be able to run individual tests, which we should be able
to do if we're to make running tests easier.

Generally speaking, the fix has been to add "model.repo.init_db()" or
"CreateTestData" calls to test setup methods.

2) Skip postgres-specific functionality

Postgres-specific functionality that won't work in sqlite includes:

 - text indexing / searching
 - regex matching (used in alphabetic batching)
 - setting up indexes

For now, tests relating to the first two above are conditionally
skipped, and the indexes are simply not added for sqlite.  The
searching should probably be better implemented in future via
pluggable searches including a null search plugin, perhaps?

2) Unicode issues

The harvester tests were returning UTF8 bytestreams, which the sqlite
driver (quite rightly) didn't approve of.  I changed the harvester
get_content code to encode to unicode.

3) SQLite in-memory optimisation

For large tests, the in-memory database was getting very large and not
getting garbage-collected, I think because the default connection pool
for sqlite is a singleton pool.  A call to
model.Session.connection().invalidate() in test teardowns caused tests
to run faster.

4) Crazy non-deterministic revision-based errors

I lost lots of time on this.  I found that running tests involving new
revisions multiple times sometimes failed, sometimes didn't.  I
eventually deduced that this would sometimes fail:

 group = model.Group()
 model.Session.add(group)
 group.name = "b0rked"
 model.repo.new_revision()
 model.Session.flush()

But this would never fail:

 model.repo.new_revision()
 group = model.Group()
 model.Session.add(group)
 group.name = "b0rked"
 model.Session.flush()

It turned out that the non-deterministic failures were related to
changes in sqlalchemy's topological sort, which is called to decide in
which order to execute outstanding actions on a flush.

The current (0.6.5) implementation or sorting ultimately relies on
both dictionary values and sets, which are of course unordered
collections (see file topological.py, which relies on sets to order;
also, the variable "allitems" is ultimately derived from dictionary
values, which are also unordered).  So I assume this is a bug -- it
would certainly explain the non-deterministic nature of the errors.

I'd wasted enough time on this already without trying to work out the
correct fix in sqlalchemy, so I just changed the tests so that any
call to new_revision() happened *before* an changes to an object.
This seemed to sort out any ambiguity in terms of ordering (as the
changes set up by new_revision() where always flushed by the time any
other changes happened).

However, this feels brittle: is it by design that the order of setting
up a new_revision and then changing an object previously didn't
matter?  Were we previously relying on SQLAlchemy to sort out the
right dependency tree for us, and if so on what basis?  I am a little
hazy about this.

5) Form validation and database integrity

With a SQLAlchemy upgrade, tests which checked that users aren't
allowed to set empty fields (e.g. package names) started throwing
IntegrityErrors, because the underlying column (i.e. 'name') had
unique=True in their table definitions (and the tests were resetting
names to blank multiple times).

The sqla upgrade seems to allow IntegrityErrors to bubble up earlier than
previously -- I didn't get round to working out why.

*Should* name-uniqueness be enforced at the database level?

6) Other random errors

I'm getting there and the tests all run in about 6 minutes; however,
there's a few tests that still need fixing and examining.  If you want
to have a look, they are basically all the tests with XXX near them in
the changes since the start of my branch.  Most of them are in
ckan/tests/functional/test_package.py

One change I'd love a review of in particular is ckan/lib/package_saver.py.

Cheers

Seb




More information about the ckan-dev mailing list