[wdmmg-dev] UK Departmental Spending Performance

Carsten Senger senger at rehfisch.de
Sat May 21 12:08:29 UTC 2011



--On Freitag, Mai 20, 2011 17:46:29 +0200
Friedrich Lindenberg <friedrich.lindenberg at okfn.org> wrote:

> I've done some more performance testing on the current production
> setup without discovering much of interest: things generally could be
> faster (especially in-site entry listings) but it scales OK. Here's
> the full spreadsheet of measurement results:
>
>
> https://spreadsheets0.google.com/ccc?key=t0dl_oBBcU-Oy1JwwHZsk1g&authkey=
> CM-wvJIB#gid=0

I've looked at the spreadsheet and saw you tuned the entry listing
with a 'name' index (rows 10-13).

The loader used to add the index. I made the failure to remove the index
when I implemented urls based on mongodb oids and 'name' is no interesting
field anymore.

I now did a manual test and the query that is used when you call a url
like:

<http://localhost:5000/entry/4dd02bf79d12a7050a0097f2>

is created by wdmmg.model.mongo.Base.by_id():

{'$or': [{'name': u'4dd02bf79d12a7050a0097f2'},
         {'_id': u'4dd02bf79d12a7050a0097f2'},
         {'_id': ObjectId('4dd02bf79d12a7050a0097f2')}]}


This query is responsible for the bad performance. Even if we do have an
index on '_id', the index is only used if we also have an index on 'name'.

With both indexes, 'name' and '_id':


db.entry.find({$or: [{'name': '4dd02bf79d12a7050a0097f2'}, {_id: o}, {_id: 
'4dd02bf79d12a7050a0097f2'}]}).explain()
{
	"clauses" : [
		{
			"cursor" : "BtreeCursor name_1",
			"nscanned" : 0,
			"nscannedObjects" : 0,
			"n" : 0,
			"millis" : 0,
			"nYields" : 0,
			"nChunkSkips" : 0,
			"isMultiKey" : false,
			"indexOnly" : false,
			"indexBounds" : {
				"name" : [
					[
						"4dd02bf79d12a7050a0097f2",
						"4dd02bf79d12a7050a0097f2"
					]
				]
			}
		},
		{
			"cursor" : "BtreeCursor _id_",
			"nscanned" : 1,
			"nscannedObjects" : 1,
			"n" : 1,
			"millis" : 0,
			"nYields" : 0,
			"nChunkSkips" : 0,
			"isMultiKey" : false,
			"indexOnly" : false,
			"indexBounds" : {
				"_id" : [
					[
						ObjectId("4dd02bf79d12a7050a0097f2"),
						ObjectId("4dd02bf79d12a7050a0097f2")
					]
				]
			}
		},
		{
			"cursor" : "BtreeCursor _id_",
			"nscanned" : 0,
			"nscannedObjects" : 0,
			"n" : 0,
			"millis" : 0,
			"nYields" : 0,
			"nChunkSkips" : 0,
			"isMultiKey" : false,
			"indexOnly" : false,
			"indexBounds" : {
				"_id" : [
					[
						"4dd02bf79d12a7050a0097f2",
						"4dd02bf79d12a7050a0097f2"
					]
				]
			}
		}
	],
	"nscanned" : 1,
	"nscannedObjects" : 1,
	"n" : 1,
	"millis" : 0
}


Without an index on 'name':

> db.entry.dropIndex('name_1')
{ "nIndexesWas" : 10, "ok" : 1 }
> db.entry.find({$or: [{'name': '4dd02bf79d12a7050a0097f2'}, {_id: o}, 
{_id: '4dd02bf79d12a7050a0097f2'}]}).explain()
{
	"cursor" : "BasicCursor",
	"nscanned" : 365113,
	"nscannedObjects" : 365113,
	"n" : 1,
	"millis" : 4362,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"isMultiKey" : false,
	"indexOnly" : false,
	"indexBounds" : {
		
	}
}

I tried to use mongodb's hint() to force it to try the _id index first,
but that does not work.

The root of all evil(tm) is the Base.by_id() method (Not only the
implemention, also the name ;). When I switched to oid urls I wanted to
refactor it into .by_id() and .by_name(). But the versatility of .by_id()
is used for the magic in wdmmg.lib.restapi.RestAPIMixin() (find a
context object and pass it to the real view action) and didn't want
touch this. I did not spot the performance implication.

I changed the loader to create the entry.name index again, but I suggest
that we clean that up at some point. I can create a ticket for that.

..Carsten







More information about the openspending-dev mailing list