[ckan-dev] test performance and sqlalchemy complete

David Read david.read at okfn.org
Tue Jan 11 13:43:15 UTC 2011


I've now merged Seb and David's code into CKAN core, so we all now can
use SQLAlchemy 0.6, SQLite and ultra fast testing. My tests run now in
under 3 minutes - super work chaps :-)

(I used David's test init improvements, and removed the monkey patch
for SQLite accordingly.)

To enable the quick tests, edit your development.ini to use:
    sqlalchemy.url = sqlite:///

If any issues crop up then please do say. Cheers,

David

On 7 January 2011 12:14, David Raznick <kindly at gmail.com> wrote:
>
>
> On Fri, Jan 7, 2011 at 10:42 AM, Seb Bacon <seb.bacon at gmail.com> wrote:
>>
>> Great.
>>
>> No time today to look into it further, but here's some questions
>> (patches applied in any case):
>>
>>  - As I understand it, by turning autoflush off, the UOW code gets the
>> chance to collapse the two deletes into one? Is that how the patch
>> works?
>
> Yep.  Within a flush, sqlalchemy makes sure every object is unique, so only
> doing one thing for each object. A delete may happen that you do not know
> about as its cascaded which only adds the the confusion.
>
>>
>>  - So sqlite exposed rather than caused the error,
>>
>> because these
>> should get collapsed even between flushes?
>
> Yes.
>
>>  Does this mean there's
>> actually a bug in the postgres sqlalchemy dialect?
>
> Well they obviously think the dialect is not 'sane' as monkeypatch flag was
> supports_sane_multi_row_count.  It looks like all the posgres dialects do
> not support it.  So its a postgres thing.  They do support sane_row_count.
> The difference is that multi_row_count is on dbapi execute commands and
> sane_multi_row_count is for executemany commands.  executemany is faster as
> it batches the commands.  Executemany is only used on deletes at the moment,
> however in sqlalchemy 0.7 it will be used on inserts too.
>
>>
>>  - Sqlalchemy's autoflush behaviour increasingly seems evil.  Could we
>> consider turning it off by default?
>
>
> This would make me happy as they are a pain to debug and slower too. There
> are about 40 times in ckan (found using grep) where it is explicitly
> switched off as it was causing problems.  However its not as trivial as I
> hoped, there is fair bit of code relying on it.  If I get the time I will
> have a go.
>>
>> Re. the postgres drop table thing: I thought I'd already done that
>> myself (I'd done something like it, backed out the changes to compare
>> speed, and forgot to reinstate them) so thanks for spotting that :)
>> The postgres tests now run in 10 minutes for me.
>>
>> Out of interest, I also did a proper like-for-like test comparing
>> postgres and sqlite against the same tests (e.g. skipping all the
>> search-related tests for both).  Sqlite is still about 20% faster,
>> though this difference stops being so significant when test times are
>> well under 10 minutes.
>>
>> Thanks,
>>
>> Seb
>>
>> On 7 January 2011 01:14, David Raznick <kindly at gmail.com> wrote:
>> > I have looked at the error and now think the monkeypatch is not the
>> > right
>> > choice as the errors are correct.
>> >
>> > The errors are due to us deleting the same row in the database twice.
>> > i.e
>> > doing  "delete from package where package.id = 1"  twice.  We should not
>> > be
>> > doing this.  In postgres, this does not cause an error because it does
>> > not
>> > report accurately how many rows it deleted to sqlalchemy. So sqlalchemy
>> > has
>> > no way of knowing if a delete actually did not work (i.e it already
>> > happened).  Sqlite does report this accurately and it shows that the
>> > tests
>> > and code are doing a lot of deleting the same row twice. The monkeypatch
>> > just turns this checking off.
>> >
>> > Anyway in ticket 868 I have attached patches for the vdm and ckan so the
>> > tests pass without the monkeypatch.  The issue were mainly due to
>> > autoflushes deleting from a cascade and then later we did the delete
>> > again.
>> > This patch has the benefit of working on sqlalchemy 0.5, it also means
>> > we do
>> > less flushing/work so things are (very slightly) faster.
>> >
>> > Also....
>> >
>> > I have attached another new patch to 868  that brings the postgres times
>> > down from 14 mins to 9 mins on my laptop that works off Sebs branch.
>> >
>> > David
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Jan 6, 2011 at 5:17 PM, James Gardner <james at 3aims.com> wrote:
>> >>
>> >> I'm personally fine with this particular monkey patch and in this
>> >> specific case think it is less confusing than installing a forked
>> >> dependency.
>> >>
>> >> James
>> >>
>> >>
>> >> On Thu, 2011-01-06 at 16:56 +0000, Seb Bacon wrote:
>> >> > On 6 January 2011 16:12, William Waites <ww at eris.okfn.org> wrote:
>> >> > > * [2011-01-06 14:28:03 +0000] Seb Bacon <seb.bacon at gmail.com>
>> >> > > écrit:
>> >> > >
>> >> > > ] I've commited a monkeypatch now.
>> >> > >
>> >> > > please roll that back and just put a patched sqlalchemy up and
>> >> > > refer
>> >> > > to it in the pip requirements.
>> >> > >
>> >> > > third party apps monkey patching libraries wreak havoc with testing
>> >> > > local changes, etc, to those libraries.
>> >> > >
>> >> > > please do not do this.
>> >> >
>> >> > While I agree with the general point about monkeypatches:
>> >> >
>> >> > (a) this patch, which only involves changing a flag from True to
>> >> > False, is only applied when running tests against a sqlite backend.
>> >> >  I
>> >> > can't think of a scenario whereby this would wreak any havoc?
>> >> >
>> >> > (b) putting up a patched tarball of sqlalchemy isn't painless either.
>> >> > It creates a bit more cognitive-maintenance overhead for when we do
>> >> > sqlalchemy updates (e.g. how will whoever does an update in a year
>> >> > know why we have our own version? should we maintain the patch next
>> >> > to
>> >> > it somewhere for that? etc)
>> >> >
>> >> > The point in (b) isn't a big deal, but I maintain neither is the
>> >> > potential issue with a monkeypatch, *in this specific case*.  Plus, I
>> >> > haven't got time to think about it again until next week ;-)
>> >> >
>> >> > Have a look at the patch
>> >> > (https://bitbucket.org/sebbacon/ckan/changeset/ad9f51642166), and if
>> >> > you're still convinced it might screw things up I'll happily change
>> >> > it
>> >> > next week.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Seb
>> >> >
>> >> > _______________________________________________
>> >> > ckan-dev mailing list
>> >> > ckan-dev at lists.okfn.org
>> >> > http://lists.okfn.org/mailman/listinfo/ckan-dev
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> ckan-dev mailing list
>> >> ckan-dev at lists.okfn.org
>> >> http://lists.okfn.org/mailman/listinfo/ckan-dev
>> >
>> >
>> > _______________________________________________
>> > ckan-dev mailing list
>> > ckan-dev at lists.okfn.org
>> > http://lists.okfn.org/mailman/listinfo/ckan-dev
>> >
>> >
>>
>>
>>
>> --
>> skype: seb.bacon
>> mobile: 07790 939224
>> land: 0207 183 9618
>> web: http://baconconsulting.co.uk
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
>




More information about the ckan-dev mailing list