You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Cliff Woolley <jw...@virginia.edu> on 2002/07/13 19:05:28 UTC

trace of apache (fwd)


---------- Forwarded message ----------
Date: Sat, 13 Jul 2002 11:02:25 -0500
From: David Cook <dc...@cookware.com>
To: jwoolley@apache.org
Subject: trace of apache

> re: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10266
>     apache hangs after some hours of running
>
>------- Additional Comments From jwoolley@apache.org  2002-07-09 22:06 -------
>We've had some other vague reports of this kind of behavior, but none of us
>have seen it that I know of.  Can you please take one of the misbehaving child
>processes and attach to it with gdb and give us a backtrace?  An strace might
>be helpful as well.  See http://httpd.apache.org/dev/debugging.html

J... I was able to catch a trace of what happens from the primary (parent)
process. Also, after the trace, I've included a couple of other thoughts
on it, since I saw it behave slightly differently this time:

23163:	/cookware/web/apache/bin/httpd -d /cookware/web/apache -f /cookware/we
-----------------  lwp# 1 / thread# 1  --------------------
 ff09a254 poll     (ffbef678, 1, bb8)
 ff04cf8c select   (3c, 0, 0, ffbef680, ffbef8d8, ffbef678) + 348
 fefab13c select   (23f0e0, 0, 10, 1, ffffffff, ff09c764) + 34
 ff2f2ad4 apr_connect (23f0e0, 278fe0, 2dc6c0, 23f0e0, ff0bd1a4, ffbefa58) + fc
 00082390 dummy_connection (0, 169c00, a, 100, 1e, 1) + c4
 00075000 perform_idle_server_maintenance (176e70, 1, ffbefbd0, 176e70, 179ea8, 1308a0) + 1e8
 00075618 ap_mpm_run (ffbefbd0, 1aef50, 169c00, 179ea8, ff37f6e8, 0) + 5e8
 0007ae44 main     (179ea8, 176e70, ffbefcfc, 16cf88, 0, 0) + 5d8
 0003d35c _start   (0, 0, 0, 0, 0, 0) + 5c
-----------------  lwp# 2 / thread# 2  --------------------
 ff09ba18 signotifywait ()
 fef9ed90 _dynamiclwps (fefbe000, 0, 0, 0, ffbef984, 4) + 1c
 fefa206c thr_yield (0, 0, 0, 0, 0, 0) + 8c
-----------------  lwp# 3  --------------------------------
 ff09c07c lwp_cond_wait (fefc5550, fefc5560, fefbedb8)
 fef990dc _age     (3e, fefbeda4, fefbe000, 0, 0, ff09c764) + 74
 fefa6c00 _sc_door_func (ffffffff, fefbf690, fefbf6a8, 3, fefbe000, 1) + 74
 fef9a770 _lwp_start (fef85d70, 0, 6000, ffbef95c, 0, 0) + 18
 fefa206c thr_yield (0, 0, 0, 0, 0, 0) + 8c
--------------------------  thread# 3  --------------------
 fef9ddf8 _reap_wait (fefc29e8, 204e4, 0, fefbe000, 0, 0) + 38
 fef9db50 _reaper  (fefbee38, fefc4748, fefc29e8, fefbee10, 1, fe401000) + 38
 fefab730 _thread_start (0, 0, 0, 0, 0, 0) + 40



In my posting online I mentioned it seems to happen roughly every 24 to
48 hours. However, yesterday, in a preemptive move to have it not happen
since I was leaving the office, I stopped and restarted the servers, assuming
it would be at least 24 hours until it happened again.  I waited about 15
minutes and then noted that it had already hung.... so this is NOT something
that happens after a *certain* amount of time, but instead, can happen just
about any time.

Additionally... as I pointed out eariler (net), of the 3 apaches we run on
a single server, only ONE of the consistantly hangs. It is also the only
apache that handles SSL for us (don't know if that has anything to do with
it), and many virtual hosts.

Finally... I had written an alarm on my local computers to monitor the remote
computer to see if it was in this state. I had assumed that it was hanging
on the open to the remote, so I put a timer around the open. However, when
the above mentioned hang occured... my monitor alarms did not go off and
instead, when I anaylized my monitors, it appeared that the open had
succeeded but it was hung in the WRITE to the server (or the read from
the response, unsure which).

Now that I think about it, this jives with my other experiences... that is...
when the server is in this hung state, if I telnet to port 80 and issue a
"GET /", I actually get a connection from the port, it honors the GET, but
nothing ever comes back from the server.

Finally... we have seen this condition clear on itself. THat is... it might
be hung for 1000 seconds or so, then, all the sudden, everything is alright
again without us doing anything at all.

This leads me to believe that it might be related to the socket read/write
portions of the code.

Aloha

-- 
David Cook -- Cookware Inc. -- dcook@cookware.com
TQworld and tranquility:  www.TQworld.com
Cookware Corporate:       www.cookwareinc.com
Hawaii/Asia Office: (808) 966-5049 (david cook)
Mainland US Office: (317) 769-5049 (deborah sellers)

Have you had tranquility today? Play now at http://www.tqworld.com