[openspending-dev] The Front Fell Off

Tryggvi Björgvinsson tryggvi.bjorgvinsson at okfn.org
Wed Apr 16 06:24:09 UTC 2014

On fös 11.apr 2014 11:34, Friedrich Lindenberg wrote:
> I got up this morning with three different people pinging me about OS not working. It's quite apparent that - whatever merits the Amazon hosting solution might have - it does not work in practice. To help with that, I would like to offer my support in moving OS to a new dedicated machine [e.g. 1] next week.

So finally I have time to reply.

First of all, thank you Friedrich for this offer. It's a great one, but
I don't think it's the right action. The problems we have been having
are not because of Amazon but it is true that they became more apparent
after the move to Amazon (we had already started experiencing them on
Rackspace, and quite frankly there was an even bigger concern with us
not being able to back up our database on Rackspace).

So what is the problem. We've now finally had some time to spend on
figuring out what our problem might be and we think we have figured it out.

The problem is that OpenSpending has grown a lot in the last year which
has resulted in our search index becoming just under 14GB in size. For
search to work properly we need all of that in the servers memory plus
some more. Now the server we're using for our search (solr instance) has
7.5GB of memory so it is very badly under-resourced. We would need at
least three servers like that to keep OpenSpending running.

Why did we buy such a small server? The reasons are what Rufus hinted at
in his reply. We need money to run the infrastructure. This is getting
very costly for a service which is provided for free and this means that
we will have to look into options of sponsored infrastructure, start
charging for use somehow or a mix of both.

So when we decided to buy the server we looked at the old server which
was hosted on Rackspace. That server had 16GB of memory. Now this memory
was not solely used by solr, it was shared with our database server (and
our celery workers). We made the mistake there of assuming that this
memory was largely needed for the database and decided we wouldn't need
as much for solr (little did we know that solr actually needed even more
memory than that). So in order to cut costs and try to stick to the
minimum (because we don't want Open Knowledge to be paying for resources
they don't need to), we went with the 7.5GB instance on Amazon.

It does not help us that when solr runs out of memory and stops
responding, the OpenSpending code base doesn't handle that gracefully
and everything just stops working.

How are we going to solve this then?

Short term solution is going to be to add more shards to Solr to
increase the memory, i.e. split the index up between a few machines.
This gives us the option of adding more machines as we grow, or remove
machines as we shrink.

Shrink? Yes. The long term solution is going to be to re-think our
search implementation/strategy.

Do we really need such a big index? I don't think so. We can for example
stop indexing private datasets. Providing search for datasets all users
do not have access to (the part of us we can call ClosedSpending) when
we're struggling with the index size is not something we should be doing.

We will also have to look into what fields we really want to index and
store. At the moment that's all fields. I think we can standardise the
dataset fields to some extent and provide search for those fields and
drop search for the dynamic fields.

This results in less functionality around the search, but will actually
increase comparability of datasets. These are only ideas but we will
have to make decisions and cuts like these in order to be able to
provide services users of OpenSpending can rely on.

Any help is welcomed. This does not have to be around coding. Coding is
just a very small part of the solution. We need help to look into the
option of sponsored hosting or how to get OpenSpending money to pay for
the increasing server power we need. We also need help to rethink our
search implementation. What can we do to reduce the size of our search
(ideas in the same area as discussed above with removing private
datasets from search or removing possibility of searching all fields).

I hope this gives a good update as to where we are and what we as the
tech community around OpenSpending need to do to fix this big problem.


Tryggvi Björgvinsson

Technical Lead, OpenSpending

The Open Knowledge Foundation <http://okfn.org>

/Empowering through Open Knowledge/

http://okfn.org/ | @okfn <http://twitter.com/OKFN> | OKF on Facebook
<https://facebook.com/OKFNetwork> | Blog <http://blog.okfn.org/> |
Newsletter <http://okfn.org/about/newsletter>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20140416/d818dc07/attachment-0002.html>

More information about the openspending-dev mailing list