[openspending-dev] The Front Fell Off
rufus.pollock at okfn.org
Wed Apr 16 07:34:46 UTC 2014
First off: thanks for the excellent (and detailed) update Trygvvi :-)
Aside: I think we probably do want a post about this so turning your
material into a post which we can share may be a great idea.
Idea: (this may be completely untenable) could we consider dropping solr,
or at least it as requirement for OS to function? I'm a bit ignorant here
but my understanding is that solr is used for searching transactions. We
could move this to postgres, or, super-crazy, if we have decent google
indexing we could use google custom search for the time being :-)
Just a thought and very happy to be told this is a crazy / impossible idea
On 16 April 2014 07:24, Tryggvi Björgvinsson
<tryggvi.bjorgvinsson at okfn.org>wrote:
> On fös 11.apr 2014 11:34, Friedrich Lindenberg wrote:
> I got up this morning with three different people pinging me about OS not working. It’s quite apparent that - whatever merits the Amazon hosting solution might have - it does not work in practice. To help with that, I would like to offer my support in moving OS to a new dedicated machine [e.g. 1] next week.
> So finally I have time to reply.
> First of all, thank you Friedrich for this offer. It's a great one, but I
> don't think it's the right action. The problems we have been having are not
> because of Amazon but it is true that they became more apparent after the
> move to Amazon (we had already started experiencing them on Rackspace, and
> quite frankly there was an even bigger concern with us not being able to
> back up our database on Rackspace).
> So what is the problem. We've now finally had some time to spend on
> figuring out what our problem might be and we think we have figured it out.
> The problem is that OpenSpending has grown a lot in the last year which
> has resulted in our search index becoming just under 14GB in size. For
> search to work properly we need all of that in the servers memory plus some
> more. Now the server we're using for our search (solr instance) has 7.5GB
> of memory so it is very badly under-resourced. We would need at least three
> servers like that to keep OpenSpending running.
> Why did we buy such a small server? The reasons are what Rufus hinted at
> in his reply. We need money to run the infrastructure. This is getting very
> costly for a service which is provided for free and this means that we will
> have to look into options of sponsored infrastructure, start charging for
> use somehow or a mix of both.
> So when we decided to buy the server we looked at the old server which was
> hosted on Rackspace. That server had 16GB of memory. Now this memory was
> not solely used by solr, it was shared with our database server (and our
> celery workers). We made the mistake there of assuming that this memory was
> largely needed for the database and decided we wouldn't need as much for
> solr (little did we know that solr actually needed even more memory than
> that). So in order to cut costs and try to stick to the minimum (because we
> don't want Open Knowledge to be paying for resources they don't need to),
> we went with the 7.5GB instance on Amazon.
> It does not help us that when solr runs out of memory and stops
> responding, the OpenSpending code base doesn't handle that gracefully and
> everything just stops working.
> How are we going to solve this then?
> Short term solution is going to be to add more shards to Solr to increase
> the memory, i.e. split the index up between a few machines. This gives us
> the option of adding more machines as we grow, or remove machines as we
> Shrink? Yes. The long term solution is going to be to re-think our search
> Do we really need such a big index? I don't think so. We can for example
> stop indexing private datasets. Providing search for datasets all users do
> not have access to (the part of us we can call ClosedSpending) when we're
> struggling with the index size is not something we should be doing.
> We will also have to look into what fields we really want to index and
> store. At the moment that's all fields. I think we can standardise the
> dataset fields to some extent and provide search for those fields and drop
> search for the dynamic fields.
> This results in less functionality around the search, but will actually
> increase comparability of datasets. These are only ideas but we will have
> to make decisions and cuts like these in order to be able to provide
> services users of OpenSpending can rely on.
> Any help is welcomed. This does not have to be around coding. Coding is
> just a very small part of the solution. We need help to look into the
> option of sponsored hosting or how to get OpenSpending money to pay for the
> increasing server power we need. We also need help to rethink our search
> implementation. What can we do to reduce the size of our search (ideas in
> the same area as discussed above with removing private datasets from search
> or removing possibility of searching all fields).
> I hope this gives a good update as to where we are and what we as the tech
> community around OpenSpending need to do to fix this big problem.
> Tryggvi Björgvinsson
> Technical Lead, OpenSpending
> The Open Knowledge Foundation <http://okfn.org>
> *Empowering through Open Knowledge*
> http://okfn.org/ | @okfn <http://twitter.com/OKFN> | OKF on Facebook<https://facebook.com/OKFNetwork>|
> Blog <http://blog.okfn.org/> | Newsletter<http://okfn.org/about/newsletter>
> openspending-dev mailing list
> openspending-dev at lists.okfn.org
> Unsubscribe: https://lists.okfn.org/mailman/options/openspending-dev
*Rufus PollockFounder and CEO | skype: rufuspollock | @rufuspollock
<https://twitter.com/rufuspollock>The Open Knowledge Foundation
<http://okfn.org/>Empowering through Open Knowledgehttp://okfn.org/
<http://okfn.org/> | @okfn <http://twitter.com/OKFN> | OKF on Facebook
<https://www.facebook.com/OKFNetwork> | Blog <http://blog.okfn.org/> |
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openspending-dev