[okfn-discuss] Re: [okfn-help] blog off

Francis Irving francis at flourish.org
Fri Jan 26 13:52:51 UTC 2007


On Fri, Jan 26, 2007 at 01:37:57PM +0000, Rufus Pollock wrote:
> Jo Walsh wrote:
> >blog.okfn.org is responding to pings, but hanging indefinitely on port 80.
> 
> hmmm. I've now been monitoring the okfn.org server for a couple of weeks 
> and loads are getting very high. Frequently I have seen it in the 5-30 
> range and I've seen it as high as 80 (at that point the server is 
> essentially dead). The problem seems to lie with apache (the main way 
> I've been solving the problem is to stop/start or restart apache).

Anything above 1 or 2 at any point is worrying. (By which I mean, it
would worry me, as it means it isn't keeping up)
 
> However I've had difficult working out exactly what the problem is. I 
> thought it might be to do with a memory leak in modpython stuff related 
> to knowledgeforge and therefore have tried upgrading modpython. I only 
> did this a few days ago so it is difficult to know how much impact this 
> is having yet. Alternatively it may just be that the machine is getting 
> too much traffic.
> 
> If there's anyone out there with suggestions or experience of tracking 
> down these kinds of problems let me know (I'd also like to know why, for 
> example, htop and top produce such different process lists ...). I've 
> got a load of top logs and other system stats which I can post on the 
> sysadmin wiki if that would help

High load is caused by one of:

1) High CPU use. When the load was high, what did top show for CPU
use, total and particular processes? Usually high load is not due to
CPU, these days.

2) Is it swapping? Run "vmstat 1" for a bit. If "si/so" are constantly
above 0, then it is actively swapping a lot. Also do "free" and just
look at the total amount of swap (although of course that doesn't mean
it is swapping)

3) Is it disk i/o bound? Again, using vmstat look at the "bi/bo"
columns to see if it is contantly doing work.

4) (I think this is possible, but don't really know) Network
saturation. Sorry, I don't know how to test for this. It is very
unlikely.

If the problem is:

1) You'll know which process it is, and can optimise it, or reduce
the amount it is running.

2) Then you need to find out what is using the memory (top and sort by
memory helps a bit, but I'm still looking for a good memory profiling
tool to say what user is using up memory, allowing properly for shared
pages). Solve by either reducing memory use, or getting more RAM put
in (I wouldn't hesitate to do the latter if you're running lots of
stuff).

3) Need to find out what is doing all the disk i/o. Using lsof might
help. Again, I don't know of a command to just say what is doing it
(I bet there is one).

4) Unless you're being attacked, it means you need more bandwidth.
Or you can do things like slow down Google if it has gone mad on 
a site.

I can ask more questions if you get more information... (e.g. Will
want to know what databases you have and how big they are and so on,
if the problem is swap or I/O)

Francis




More information about the okfn-discuss mailing list