[wdmmg-dev] openspending.org back up

Sat Mar 19 22:27:50 UTC 2011

--On Samstag, März 19, 2011 17:34:18 +0100 Friedrich Lindenberg 
<friedrich.lindenberg at okfn.org> wrote:

> On Sat, Mar 19, 2011 at 1:23 AM, Stefan Wehrmeyer
> <stefanwehrmeyer at gmail.com> wrote:

[...]

>> So it sounds like we need a scaling plan, because in the end we want the
>> data to be available. I think we need to figure out exactly what kind of
>> queries we want to support, have these indizes ready and forbid other
>> queries. It would be a good idea to abstract all read queries into class
>> methods and stop calling .find arbitrarily.
>
>
> I wouldn't start refactoring before having a clear picture of whats
> going on - i.e. record mongostats and query times for several database
> sizes. If memory is the issue (as I suspect - we likely have grown
> above 3G in indexes and that means they get swapped to disk), then no
> amount of refactoring will solve that.

Why are indexes swaped out if they are bigger than 3 GB? If so we seem to
be lucky and that's not our problem. We had 900 Mb of indexes before and
100 Mb after dropping.

Here are the db stats. I did not do collection stats before unfortunately.
Before:
> db.stats()
{
	"db" : "openspending-planet-a",
	"collections" : 7,
	"objects" : 1441301,
	"avgObjSize" : 4597.885216203971,
	"dataSize" : 6626936560,
	"storageSize" : 8100501504,
	"numExtents" : 55,
	"indexes" : 13,
	"indexSize" : 917381120,
	"fileSize" : 12812550144,
	"ok" : 1
}

After:
> db.stats()
{
	"db" : "openspending-planet-a",
	"collections" : 7,
	"objects" : 326129,
	"avgObjSize" : 2800.1458073339077,
	"dataSize" : 913208752,
	"storageSize" : 931017472,
	"numExtents" : 40,
	"indexes" : 13,
	"indexSize" : 99966976,
	"fileSize" : 4226809856,
	"ok" : 1
}

> In that case we may want to
> reduce the number of indexes we keep and also think about 64-bit kinds
> of memory.
>
> We may also want to limit the denormalization of data we do on the
> entries collection, i.e. only store "id", "name", "ref" and "label" on
> the copied sub-types of dataset, entry and entity. Looking at an entry
> from FTS that would cut the collection size by about half and we could
> fake most of the lost information via dereference() and solr as
> needed.
>
> As a final note, we could consider a different model for
> pre-aggregtion where we store much less data (i.e. the output of
> mongodb's own aggregation functions) and don't materialize
> aggregations in the full way we do at the moment.

Yes, when we know what added to our problem and where we need to scale
to we have a lot of options we can evaluate.

I've started a wiki page to collect requirements and ideas to work on
performance:
<http://wiki.openspending.org/Optimization_and_Scaling_Considerations>

>>  A proper robots.txt file that disallows /api pages would help, I
>> created a ticket for that: http://trac.openspending.org/ticket/50
>
> Cool.

That's an excelent idea.

..Carsten