You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Graham Leggett <mi...@sharp.fm> on 1999/05/17 13:41:22 UTC

httpd processes getting stuck

Hi all,

I am currently having a problem with httpd, and I'm a bit baffled. It
seems over time, httpd processes are getting "stuck", basically in a
situation where they no longer respond. To make up for this, Apache
spawns some new working httpds to do the job of the broken ones,
eventually the box runs out of RAM and problems ensue.

We rotate our logfiles once per hour, and when we do a graceful restart,
we get the following logged:

[Mon May 17 13:00:03 1999] [warn] child process 16267 did not exit,
sending another SIGHUP
...lots of these
[Mon May 17 13:00:04 1999] [warn] child process 16267 still did not
exit, sending a SIGTERM
...lots of these
[Mon May 17 13:00:09 1999] [error] child process 16267 still did not
exit, sending a SIGKILL
...lots of these
[Mon May 17 13:00:25 1999] [notice] SIGHUP received.  Attempting to
restart
[Mon May 17 13:00:30 1999] [notice] Apache/1.3.7-dev (Unix) configured
-- resuming normal operations

If I take a look at the extended output of /server-status, I see a
number of entries like this one. The number of these entries increases
over time, and goes back to zero after the restart/cleanup above.

53-30 20835 0/28/28 R 0.15 1219 58092 0.0 1.35 1.35  ? ? ..reading..
                           ^^^^ seconds since start of request, showing
it's stuck

These stuck processes eventually force the server to reach the maximum
server limit.

If I try a ptrace on the stuck httpd process, it seems to be sitting
waiting for a read request. I have no idea what it's waiting for:

[1:30pm] root@infobase3:~# /usr/proc/bin/pstack 20835
20835:	/opt/local/apache/bin/httpd
lwp#1 ----------
 ef5b8598 read     (3, 1acdd8, 1000)
 ef5b8598 _libc_read (3, 1acdd8, 1000, effff3d8, ef622eb4, 1bfb0) + 8
 ef1d5a9c _ti_read (1acd98, 1acdd8, 1000, 0, 0, 0) + 30
 0001e85c buff_read (1acd98, 1acdd8, 1000, 0, 0, 0) + 1c
 0001e7cc saferead_guts (1acd98, 1acdd8, 1000, ffffffff, fffffff8,
2e1df8) + 40
 0001c824 read_with_errors (1acd98, 1acdd8, 1000, 7ffffffd, efffd2ac,
efffd248) + 1c
 0001ccf0 ap_bgets (efffd2d0, 2000, 1acd98, 8, 109, 87d) + d0
 000327e4 getline  (efffd2d0, 2000, 1acd98, 1, 53c00, 2e0cc0) + 2c
 000330d4 get_mime_headers (2e0b38, 2e0b38, 22e030, 9ffffc00, effffc00,
1) + 84
 00033704 ap_read_request (6dba08, 6bc00, 1acd98, effff3d8, effff3e8,
35) + 298
 0002f230 child_main (35, 2d60c, 2d400, 6e860, ef622eb4, 2f51c) + 720
 0002f5dc make_child (6e860, 35, 373ff6d7, 6e860, 538a0, 20) + 190
 0002faf4 perform_idle_server_maintenance (0, effff614, e, 6e860, 53a38,
4ea60) + 3c4
 00030254 standalone_main (1, effff74c, 0, 0, ef6259c0, ef625c4c) + 4b0
 00030b30 main     (1, effff74c, effff754, 6c4a0, 0, 0) + 524
 000179b8 _start   (0, 0, 0, 0, 0, 0) + 5c
lwp#2 ----------
 ef5b97e8 signotifywait ()
 ef1cbd4c _dynamiclwps (ef1e6ad0, 1, 20000, fffdffff, 0, 776d8) + 1c
 ef606e4c thr_errnop (0, 0, 0, 0, 0, 0) + 24
lwp#5 ----------
 ef5b9654 lwp_cond_wait (ef1eb1b8, ef1eb1c8, ef305ca8)
 ef5ea2c4 _lwp_cond_timedwait (ef1eb1b8, ef1eb1c8, 0, 373fff0c, 0, 0) +
90
 ef1c74c4 _age     (ef1e6ad0, ef1e7acc, ef1e81f0, ef1e8208, 3, ef1e6ad0)
+ 90
 ef1c89c8 _lwp_start (0, 1, 6000, effff3ec, 5, 0) + 14
 ef606e4c thr_errnop (0, 0, 0, 0, 0, 0) + 24
lwp#4 ----------
 ef5b96a0 lwp_sema_p (eef0de78)
 ef5b96a0 __lwp_sema_wait (eef0de78, 0, 0, 0, 0, 0) + 8
 ef1c7e00 _park    (eef0ddd8, eef0de78, 0, 1, ef1e7ad8, 0) + 10c
 ef1c7ae4 _swtch   (5, ef1e6ad0, eef0de58, eef0de54, eef0de50, eef0de4c)
+ 360
 ef1caf60 _reap_wait (ef1e8678, ef1eb530, 0, 0, 0, 0) + 34
 ef1cacec _reaper  (ef1e6ad0, ef1e8678, ef1e7b28, ef1f0a04, 1, fe400000)
+ 34
 ef1d641c _thread_start (0, 0, 0, 0, 0, 0) + 40
lwp#6 ----------
 ef5b96a0 lwp_sema_p (ef1e84d8)
 ef5b96a0 __lwp_sema_wait (ef1e84d8, 61266708, 0, eee0bd4c, ef622eb4,
ef1c723c) + 8
 ef1c7220 _co_timerset (ef1e84c8, ef1e7a60, ef1e6ad0, ef1e8208, 3,
ef1e6ad0) + f4
 ef1d641c _thread_start (0, 0, 0, 0, 0, 0) + 40

Platform: Solaris v2.6
Apache: v1.3.7-dev (19990510071220)

Any ideas on where I should be looking for a solution?

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight...


httpd processes getting stuck - debug help

Posted by Graham Leggett <mi...@sharp.fm>.
Hi all,

I am trying to further debug my problem with stuck httpd's, and have got
this far:

- I've identified the httpd's which are stuck
- I've connected with truss and got this:

read(3, 0x0007A408, 4096)	(sleeping...)

Now, I need to determine where exactly in the Apache source the above
"read" request exists, so that I can start working backwards to find the
problem. Trouble is I am pretty clueless when it comes to debug tools
and how to use them, I would be hugely grateful if I could have a few
pointers. Armed with an Apache installation running under a Solaris v2.6
system and gdb, is it possible to locate where I am in the source code
in a certain process?

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight...