You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by David Fallon <da...@gmail.com> on 2010/06/15 19:02:30 UTC

[users@httpd] Apache getting stuck with all workers in a BUSY_READ state

Hi, I've been having problems with apache becoming unresponsive, and
was wondering if anyone had any suggestions on what the problem might
be. Basically, periodically, apache will get into a state where all
the workers are stuck reading:

Server Version: Apache
Server Built: Oct 21 2009 10:54:43
Current Time: Tuesday, 15-Jun-2010 07:57:30 PDT
Restart Time: Tuesday, 15-Jun-2010 06:37:33 PDT
Parent Server Generation: 0
Server uptime:  1 hour 19 minutes 57 seconds
Total accesses: 985801 - Total Traffic: 8.1 GB
CPU Usage: u644.89 s203.76 cu3994.75 cs0 - 101% CPU load
206 requests/sec - 1.7 MB/second - 8.6 kB/request
1593 requests currently being processed, 15 idle workers
RRRRRRRRRRRRRCRRRRKRRRRRRRRRRRRRRRRRRRKRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRCRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRCRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRCRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRCRRRRRRRRRRRRRRRRRRRRRRKRRRRRRRRRRRKRKRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRWRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRWRRRRKKCRRKRKRRRRKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRC
RRKKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRCKRRRRCCRRKRRRRRRRRRRRRRRRKRRRCRRRRRRRRRRRRRCCRRRCRRCRRR
RRRRRRRKKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRWRRRRRKRRRKRRRRRRRRRRRRRW
KKRRRRRRRKRRRRWRKRRRRRRRRRRRRRRRWRRRRRRRRRRRR___RRR__RR___R_____
WRR__RRRSS......................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................

This is prior to complete failure - sometimes whatever's blocking gets
unblocked before it hits max clients, sometimes it doesn't. I'm
running apache 2.0.59 built with openssl 0.9.8n on AIX 6.1 with
prefork, and this is virtually all SSL traffic (pretty much everything
other than the scoreboard). A restart basically "fixes" the problem,
from the perspective that all the workers get killed and after the
initial thrashing of starting up new workers.

>From my understanding of the READ state above, everything above is
stuck in one of two broad categories:

 - A client made the TCP connection to the server, and is somewhere
between the tcp handshake and the end of the HTTP Request info. This
suggests it could be a network issue (something's hanging the
connections), or an openssl issue (the TLS/SSL negotiation is
slow/hanging), or...?
 - The request has been completed, but we're proxying to somewhere
else and waiting for a response from the proxy. This potentially
applies in this case, because we do have apache setup to proxy some
URLs to another server.

There's nothing in the access or error logs jumping out to correlate
with this problem either - There are MaxClient issues once it hits
that, of course, but nothing related to the BUSY_READ state.

 When having the problem, I've correlated the scoreboard with the
ps/lsof/netstat output, and the second case seems unlikely because I'm
not seeing any open connections to the server that apache is proxy'ing
to. It feels like there's some shared resource that all the apache
workers are trying to access, but I can't figure out what it might be.
Any suggestions on a solution, or how I might get more info out of
apache as to what it's doing while everyone's in the read state? Are
there other broad categories I'm missing as to why the workers might
be in the read state? Any further info I could provide to help anyone?
My next steps are to dive into the apache source further and see what
possible resources it could be blocking on, but I'm hoping someone
smarter than me already knows. :)


-- 
Dave Fallon

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Apache getting stuck with all workers in a BUSY_READ state

Posted by Eric Covener <co...@gmail.com>.
On Wed, Jun 16, 2010 at 9:33 AM, David Fallon <da...@gmail.com> wrote:
> Thanks for the suggestion, but unfortunately I've tried that - truss
> in this case is attaching post whatever it's blocking on (so I just
> see it sleeping), and I haven't yet waited out the problem to see what
> happens when/if whatever's blocking times out. Any other ideas?
>

http://httpd.apache.org/dev/debugging.html

-- 
Eric Covener
covener@gmail.com

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Apache getting stuck with all workers in a BUSY_READ state

Posted by The Gaijin <ga...@gci.net>.
On 06/16/2010 05:33 AM, David Fallon wrote:
> Thanks for the suggestion, but unfortunately I've tried that - truss
> in this case is attaching post whatever it's blocking on (so I just
> see it sleeping), and I haven't yet waited out the problem to see what
> happens when/if whatever's blocking times out. Any other ideas?

pstack and pfiles might be of use.  (Sol 8+ IIRC.)

Good hunting.

R.


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Apache getting stuck with all workers in a BUSY_READ state

Posted by Scott Gifford <sg...@suspectclass.com>.
On Wed, Jun 16, 2010 at 9:33 AM, David Fallon <da...@gmail.com> wrote:

> Thanks for the suggestion, but unfortunately I've tried that - truss
> in this case is attaching post whatever it's blocking on (so I just
> see it sleeping), and I haven't yet waited out the problem to see what
> happens when/if whatever's blocking times out. Any other ideas?
>

truss reports nothing?  Not that it's waiting in a blocking operation?  On
the systems I'm familiar with at least, that means it's not waiting for I/O,
but is off doing something else.  That would indicate it's not waiting for a
proxy response or a user request, but something else altogether.  I'm not
sure what your system is, so its truss may behave differently, you could do
some quick experiments if you're not sure either.

The suggestion to take a look in a debugger is a good one if your Apache has
debugging symbols.  That may be a good next step.

Good luck!

-----Scott.

Re: [users@httpd] Apache getting stuck with all workers in a BUSY_READ state

Posted by David Fallon <da...@gmail.com>.
Thanks for the suggestion, but unfortunately I've tried that - truss
in this case is attaching post whatever it's blocking on (so I just
see it sleeping), and I haven't yet waited out the problem to see what
happens when/if whatever's blocking times out. Any other ideas?

On Tue, Jun 15, 2010 at 9:59 PM, Scott Gifford
<sg...@suspectclass.com> wrote:
> On Tue, Jun 15, 2010 at 1:02 PM, David Fallon <da...@gmail.com> wrote:
> [ ... ]
>>
>> Any suggestions on a solution, or how I might get more info out of
>> apache as to what it's doing while everyone's in the read state?
>
> I would try using strace (or ktrace or truss depending on your OS) on the
> processes to see what they are doing.  Between that and lsof you should be
> able to tell what the process is blocked reading.
> Hope this is helpful,
> ----Scott.
>



-- 
Dave Fallon

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Apache getting stuck with all workers in a BUSY_READ state

Posted by Scott Gifford <sg...@suspectclass.com>.
On Tue, Jun 15, 2010 at 1:02 PM, David Fallon <da...@gmail.com> wrote:
[ ... ]

> Any suggestions on a solution, or how I might get more info out of
> apache as to what it's doing while everyone's in the read state?


I would try using strace (or ktrace or truss depending on your OS) on the
processes to see what they are doing.  Between that and lsof you should be
able to tell what the process is blocked reading.

Hope this is helpful,

----Scott.