[ckan-dev] Solr out of memory

Sun Dec 6 10:51:31 UTC 2015

Hi,

> On 6 Dec 2015, at 09:51, Carl Lange <carl at derilinx.com> wrote:
> Recently we moved from an m3.large aws instance to a 4GB linode, having discovered that we didn't require nearly as much power as we thought. However, since then, our Solr core has been causing some major issues. To begin with, it runs out of memory very quickly (with a JavaOutOfMemory exception). Other times, I simply get a 500 from solr, with no other info. I have only about 1200 datasets - surely I don't need more than 4GB of RAM to search this?
> Could anyone point me in the direction of the best way to debug these issues? I find myself restarting jetty every ten minutes in order to get my search back, which is a little unsustainable ;)

You may need to just give the VM access to more heap, the thing to investigate in the jetty config is JAVA_OPTIONS (I think) and increasing the heap size using the -Xmx option. This might buy you some time to investigate, so you don't have to sit up sudo service ... restart.

> The problem seems to have manifested itself after google decided to crawl all our search pages yesterday, 1 per second. I've brought this rate down to 0.3 a second, which has helped a little. Prior to this, we had reasonably stable search for about a month.

I feel your pain.  We often get Google and Baidu and other smaller spiders all deciding at the same time to spider the site, and in cases where something is trying to index every single page, varnish isn't much use. Also, we often get harvested at ridiculous run rates (like dozens r/s per IP) .. it may be worth checking someone isn't just hammering the site really hard.

> I notice that there are quite a large number of indexed fields in schema.xml - are these all necessary? Same goes for stored fields. (I'm using a slightly modified version of the data.gov.uk schema).

I don't think so, and it is probably long overdue a bit of pruning.  Storing the validated dict in Solr probably doesn't help with document size.  You might try testing without that dict, and with use_cache: False passed to the package_show action (in the context) so that it doesn't depend on it being in Solr. It might be that CKAN is no fast enough to not rely on it any more.

> Really though, I'm just shooting in the dark - I don't know if it's my schema, or if it's anything else, and so some info on how to debug this would be great.

I'd measure exactly how much ram your Solr currently has access to, and is using, and investigate document sizes.  

I'm very keen on reducing CKAN's footprint and improving performance, so if you do find something, please do let us know.

Ross