You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Sreeji K Das <sr...@yahoo.com> on 2002/07/30 12:36:40 UTC

httpd hangs on read() forever on Solaris

Hai All,

I had been facing a strange problem for the past few
weeks.I have searched all over the net & archives,
read the FAQ, there seems to be no one who has faced
an issue like this !

I use Apache-1.3.23 with mod_perl-1.26 and perl-5.6.1
on SunOS 5.6 (SPARC).

All was going well until we moved from
Apache-1.3.6/mod_perl-1.19 to the above versions. I
see that most of the httpd child hangs after accepting
some requests.
Following is a truss of the hanging process:

$ truss -p 6454
read(52, 0x001FF898, 4096)      (sleeping...)
lwp_cond_wait(0xEF48B340, 0xEF48B350, 0xEF425CA0)
Err#62 ETIME
time()                                          =
1028021872
read(52, 0x001FF898, 4096)      (sleeping...)
lwp_cond_wait(0xEF48B340, 0xEF48B350, 0xEF425CA0)
Err#62 ETIME
time()                                          =
1028022172
read(52, 0x001FF898, 4096)      (sleeping...)

The above goes on infinite. ie. read() blocks for 300
secs. then an lwp_cond_wait() breaks it and again
read() is entered & goes to block. I have seen that
processes are in this state for more than 3 days -
they never get killed. After about 3/4 days, the max.
process limit is hit & we need to manually truss and
kill all hanging children.
I have verified that fd 52 is the client socket, using
'lsof' command.

Notes:
1) Not all processes go to this state, but a
considerable no. of them do.
2) Some processes handle as many as 400 requests
before hanging, while some others handle as few as 30
requests before entering the infinite loop.
3) This do not pertain to any particular request. I
analysed requests handled by many hanging/non-hanging
processes & there was no pattern.
4) This may not have anything to do with the client,
since:
 a) All these was working well with the old
apache/mod_perl combo.
 b) I see almost 70 processes hanging after 3/4 days
(MaxClients is 100). It's unlikely that all of them
would have dealt with faulty browsers ! Also it was
working well with older version of apache.

Following are some info I've collected on the hanging
process:

Stack trace
--------------------------------------------
$ pstack 6454 
6454:   /bin/httpd -f /test/conf/httpd.conf
lwp#1 ----------
 ef5b8708 read     (34, 1ff898, 1000)
 ef5b8708 _libc_read (34, 1ff898, 1000, effff424,
ef623700, 8e3c0) + 8
 ef475bd8 _ti_read (1ff850, 1ff898, 1000, 0, ef623700,
aacb4) + 34
 0008e428 buff_read (1ff850, 1ff898, 1000, 0, 0, 0) +
28
 0008f0c8 saferead_guts (1ff850, 1ff898, 1000, 0, 0,
0) + 50
 0008f170 read_with_errors (1ff850, 1ff898, 1000,
ffffffff, fffffff8, 2146f6c) +
 28
 0008f7c4 ap_bgets (efffd32c, 2000, 1ff850, efffedb8,
0, 0) + 10c
 000afc84 getline  (efffd32c, 2000, 1ff850, 0,
fffffff8, 21469e0) + 3c
 000b0240 read_request_line (2146840, 2146840, 1,
ef7017d4, 4, 1) + c8
 000b0f7c ap_read_request (20cf820, 1ef040, 1ff850,
effff424, effff434, 10) + 2b
c
 000aa8b0 child_main (10, a7eb0, ef488390, 0,
ef623700, aacb4) + af8
 000aadbc make_child (1ef040, 10, 3d44e004, ef6266cc,
ef6266cc, 0) + 21c
 000ab41c perform_idle_server_maintenance (ffffffff,
0, 0, 1ef040, 1b63e4, 1e456
c) + 494
 000abe90 standalone_main (3, effff73c, 1d5b88, 66,
ef626228, ef626514) + 710
 000acb50 main     (3, effff73c, effff74c, 1d2c00, 0,
0) + 788
 00033dd4 _start   (0, 0, 0, 0, 0, 0) + dc
lwp#2 ----------
ef5b9958 signotifywait ()
 ef46be0c _dynamiclwps (ef486c40, 54, 0, 0, 0,
ef7efda8) + 1c
 ef60764c thr_errnop (0, 0, 0, 0, 0, 0) + 24
lwp#4 ----------
 ef5b9810 lwp_sema_p (ee70de78)
 ef5b9810 __lwp_sema_wait (ee70de78, 0, 0, 0, 0, 0) +
8
 ef467e98 _park    (ee70ddd0, ee70de78, 0, 1,
ef487c60, 0) + 10c
 ef467b7c _swtch   (5, ef486c40, ee70de54, ee70de50,
ee70de4c, ee70de48) + 360
 ef46aff8 _reap_wait (ef488800, ef48b6c0, 0, 0, 0, 0)
+ 34
 ef46ad84 _reaper  (ef486c40, ef488800, ef487cb0,
ef490b94, 1, fe401000) + 34
 ef476590 _thread_start (0, 0, 0, 0, 0, 0) + 40
lwp#6 ----------
 ef5b97c4 lwp_cond_wait (ef48b340, ef48b350, ef425ca0)
 ef5ea7d4 _lwp_cond_timedwait (ef48b340, ef48b350, 0,
3d46631f, 0, 0) + 90
 ef46755c _age     (ef486c40, ef487c54, ef488378,
ef488390, 3, ef486c40) + 90
 ef468a60 _lwp_start (0, 1, 6000, effff304, 35, 0) +
14
 ef60764c thr_errnop (0, 0, 0, 0, 0, 0) + 24
----------------------------------------------

a) I could never reproduce this on my development
machines. This seem to happen only on high load.
b) I cannot do much debugging on my production server.
The server is not compiled with -g (and cannot be :-(
) so gdb/dbx won't work.
c) I don't have 'truss' report on a process as to what
it did before entering the hang stage. I can't truss
processes for a long period on production server.

The only other place a similar issue was reported
was here:
http://www.apachelabs.org/httpd-users/200201.mbox/%3c20020120132809.A7125@leeor.math.technion.ac.il%3e
However, the above posting did not receive any
responses.

Has any one else faced same/similar issues ? Can any
one offer any clue for further debugging (I sent a
SIGBUS, but no core was dumped !) or any solution ? Is
there any tool with which I can simulate high load ?
(I just downloaded JMeter - any other suggestions ?).

Thanks for any response.
Sreeji

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org