You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Jason Cox <cs...@gmail.com> on 2008/11/05 01:53:45 UTC
[users@httpd] Re: High "sending reply" count, server stops responding
Bump....
On Thu, Oct 23, 2008 at 6:10 PM, Jason Cox <cs...@gmail.com> wrote:
> I am seeing this issue on my web servers. It all started a couple of
> weeks ago. I looked at my change logs and see no changes that would
> correlate to this happening.
> So this is what happens. A process reaches the MaxRequestsPerChild, in
> this case 10000, it dies off but the parent process still thinks it is
> running. When I look at the server-status page the processes are in a
> "Sending Reply" state.
>
> Here is a score board from one of my servers:
>
> Scoreboard: WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW_____________________W_WWW_________________....................................................................................................................................................................
>
> Here is a small part of what I see on the server-status page:
>
> 0-0 21171 1/9999/9999 W 12.31 3081 15 0.0 76.34 76.34
> 10.241.48.117 www.blah.com GET /js/s_code_remote.js HTTP/1.0
> 1-0 21172 1/9995/9995 W 12.65 2719 33 0.0 65.51 65.51
> 10.241.48.149 free.blah.com GET /js/player.js HTTP/1.0
> 2-0 21173 1/9996/9996 W 12.37 1725 17 0.0 67.84 67.84
> 10.241.48.119 free.blah.com GET /js/site_catalyst.js HTTP/1.0
> 3-0 21174 1/9996/9996 W 12.46 2718 3 0.0 66.14 66.14
> 10.241.48.118 free.blah.com GET /js/cookies.js HTTP/1.0
> 4-0 21175 1/9997/9997 W 12.23 2711 95 0.7 70.41 70.41
> 10.241.48.147 free.blah.com GET /playXML/track/13356268?r=0.33
> HTTP/1.0
> 5-0 21176 1/9993/9993 W 12.59 2233 7 0.0 65.62 65.62
> 10.241.48.148 free.blah.com GET /js/google_adsense.js HTTP/1.0
> 6-0 21177 1/10000/10000 W 12.41 2367 15 0.0 64.63 64.63
> 10.241.48.117 free.blah.com GET /js/global.js HTTP/1.0
> 7-0 21178 1/9997/9997 W 12.49 2606 3 0.0 63.83 63.83
> 10.241.48.148 free.blah.com GET /js/table_constructor.js HTTP/1.0
> 8-0 21179 1/9997/9997 W 12.32 2900 17 0.0 67.04 67.04
> 10.241.48.117 free.blah.com GET /js/cookies.js HTTP/1.0
>
>
> Below is some stuff I captured from a web server that stopped
> responding to web requests all together.
>
> [ ~]# ps ax |grep httpd
> 21457 ? Ss 0:01 /usr/local/www/bin/httpd
> 1414 pts/0 S+ 0:00 grep httpd
>
> [ ~]# lsof -p 21457
> COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
> httpd 21457 root cwd DIR 8,1 4096 2 /
> httpd 21457 root rtd DIR 8,1 4096 2 /
> httpd 21457 root txt REG 8,3 501900 471 /usr1/www/bin/httpd
> httpd 21457 root mem REG 0,0 0 [vdso]
> (stat: No such file or directory)
> httpd 21457 root mem REG 8,1 126648 36049 /lib/ld-2.3.5.so
> httpd 21457 root mem REG 8,1 1489572 36050 /lib/libc-2.3.5.so
> httpd 21457 root mem REG 8,1 25476 106215
> /usr/lib/libgdbm.so.2.0.0
> httpd 21457 root mem REG 8,1 27660 36057
> /lib/libcrypt-2.3.5.so
> httpd 21457 root mem REG 8,1 196676 36058 /lib/libm-2.3.5.so
> httpd 21457 root mem REG 8,1 125160 106805
> /usr/lib/libexpat.so.0.5.0
> httpd 21457 root mem REG 8,1 46552 31966
> /lib/libnss_files-2.3.5.so
> httpd 21457 root DEL REG 0,8 983040 /SYSV00000000
> httpd 21457 root 0r CHR 1,3 1326 /dev/null
> httpd 21457 root 1w CHR 1,3 1326 /dev/null
> httpd 21457 root 2w FIFO 0,6 460663664 pipe
> httpd 21457 root 3r FIFO 0,6 460663693 pipe
> httpd 21457 root 4w FIFO 0,6 460663664 pipe
> httpd 21457 root 5w FIFO 0,6 460663666 pipe
> httpd 21457 root 6w FIFO 0,6 460663668 pipe
> httpd 21457 root 7w FIFO 0,6 460663673 pipe
> httpd 21457 root 8w FIFO 0,6 460663674 pipe
> httpd 21457 root 9w FIFO 0,6 460663693 pipe
> httpd 21457 root 10r FIFO 0,6 460663696 pipe
> httpd 21457 root 11w FIFO 0,6 460663696 pipe
> httpd 21457 root 12r FIFO 0,6 460663699 pipe
> httpd 21457 root 13w FIFO 0,6 460663699 pipe
> httpd 21457 root 14r FIFO 0,6 460663702 pipe
> httpd 21457 root 15u IPv4 460663477 TCP *:http (LISTEN)
> httpd 21457 root 16w FIFO 0,6 460663702 pipe
> httpd 21457 root 17r FIFO 0,6 460663705 pipe
> httpd 21457 root 18w FIFO 0,6 460663705 pipe
>
> [ ~]# strace -p 21457
> Process 21457 attached - interrupt to quit
> select(0, NULL, NULL, NULL, {0, 540000}) = 0 (Timeout)
> time(NULL) = 1224806756
> waitpid(-1, 0xbfc31190, WNOHANG) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> time(NULL) = 1224806757
> waitpid(-1, 0xbfc31190, WNOHANG) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> time(NULL) = 1224806758
> waitpid(-1, 0xbfc31190, WNOHANG) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> time(NULL) = 1224806759
> select(19, NULL, [9 11 13 16 18], NULL, {0, 0}) = 5 (out [9 11 13 16
> 18], left {0, 0})
> waitpid(-1, 0xbfc31190, WNOHANG) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> time(NULL) = 1224806760
> waitpid(-1, 0xbfc31190, WNOHANG) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> time(NULL) = 1224806761
> waitpid(-1, 0xbfc31190, WNOHANG) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> time(NULL) = 1224806762
> waitpid(-1, 0xbfc31190, WNOHANG) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> time(NULL) = 1224806763
> waitpid(-1, 0xbfc31190, WNOHANG) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> time(NULL) = 1224806764
> waitpid(-1, 0xbfc31190, WNOHANG) = 0
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> time(NULL) = 1224806765
> waitpid(-1, 0xbfc31190, WNOHANG) = 0
> select(0, NULL, NULL, NULL, {1, 0} <unfinished ...>
> Process 21457 detached
>
> Here is the server info. It is FC4:
>
> [ ~]# uname -a
> Linux web10.sd 2.6.17-1.2142smp #1 SMP Thu Sep 14 15:27:50 PDT 2006
> i686 i686 i386 GNU/Linux
>
> [ ~]# httpd -V
> Server version: Apache/1.3.34 (Unix)
> Server built: Aug 14 2006 16:11:23
> Server's Module Magic Number: 19990320:18
> Server compiled with....
> -D HAVE_MMAP
> -D HAVE_SHMGET
> -D USE_SHMGET_SCOREBOARD
> -D USE_MMAP_FILES
> -D HAVE_FCNTL_SERIALIZED_ACCEPT
> -D HAVE_SYSVSEM_SERIALIZED_ACCEPT
> -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
> -D DYNAMIC_MODULE_LIMIT=64
> -D HARD_SERVER_LIMIT=256
> -D HTTPD_ROOT="/usr/local/www"
> -D SUEXEC_BIN="/usr/local/www/bin/suexec"
> -D DEFAULT_PIDLOG="logs/httpd.pid"
> -D DEFAULT_SCOREBOARD="logs/httpd.scoreboard"
> -D DEFAULT_LOCKFILE="logs/httpd.lock"
> -D DEFAULT_ERRORLOG="logs/error_log"
> -D TYPES_CONFIG_FILE="conf/mime.types"
> -D SERVER_CONFIG_FILE="conf/httpd.conf"
> -D ACCESS_CONFIG_FILE="conf/access.conf"
> -D RESOURCE_CONFIG_FILE="conf/srm.conf"
>
> Here is the top part of my httpd.conf:
>
> #################################
> ### SECTION 1: Global Environment
> #################################
>
> ServerType standalone
> Port 80
> HostnameLookups Off
> User wwwadmin
> Group wwwadmin
>
>
> Listen "80"
> ServerRoot "/usr/local/www"
> DocumentRoot "/usr/local/www/htdocs"
>
> LockFile /var/lock/httpd.lock
> PidFile logs/httpd.pid
> ScoreBoardFile logs/apache_runtime_status
> Timeout 120
> ExtendedStatus On
> UseCanonicalName On
> ServerSignature Off
> ServerTokens prod
> UserDir disabled
>
> AddDefaultCharset utf-8
>
> ###SERVER TUNING###
>
> KeepAlive Off
> MaxKeepAliveRequests 100
> KeepAliveTimeout 15
> MinSpareServers 10
> MaxSpareServers 20
> StartServers 50
> MaxClients 125
> MaxRequestsPerChild 10000
>
>
> If anyone else has seen this behavior before I would appreciate some
> help. Thanks.
> ---
> Jason Cox
>
--
Jason Cox
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
[users@httpd] Re: High "sending reply" count, server stops responding
Posted by Jason Cox <cs...@gmail.com>.
Wow so no help? No one has any ideas on what might be going on? Is
this even the correct forum to try and get help? Would it help if I
mentioned I can not duplicate this in my DEV nor QA server
environment? That only the production servers are seeing this issue
and the only difference from Prod and QA/DEV is the server hardware? I
wanted to leave upgradng apache as my last resort, but if I can not
get help I guess that is the only thing to try and fix it right now.
On Tue, Nov 4, 2008 at 4:53 PM, Jason Cox <cs...@gmail.com> wrote:
> Bump....
>
> On Thu, Oct 23, 2008 at 6:10 PM, Jason Cox <cs...@gmail.com> wrote:
>> I am seeing this issue on my web servers. It all started a couple of
>> weeks ago. I looked at my change logs and see no changes that would
>> correlate to this happening.
>> So this is what happens. A process reaches the MaxRequestsPerChild, in
>> this case 10000, it dies off but the parent process still thinks it is
>> running. When I look at the server-status page the processes are in a
>> "Sending Reply" state.
>>
>> Here is a score board from one of my servers:
>>
>> Scoreboard: WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW_____________________W_WWW_________________....................................................................................................................................................................
>>
>> Here is a small part of what I see on the server-status page:
>>
>> 0-0 21171 1/9999/9999 W 12.31 3081 15 0.0 76.34 76.34
>> 10.241.48.117 www.blah.com GET /js/s_code_remote.js HTTP/1.0
>> 1-0 21172 1/9995/9995 W 12.65 2719 33 0.0 65.51 65.51
>> 10.241.48.149 free.blah.com GET /js/player.js HTTP/1.0
>> 2-0 21173 1/9996/9996 W 12.37 1725 17 0.0 67.84 67.84
>> 10.241.48.119 free.blah.com GET /js/site_catalyst.js HTTP/1.0
>> 3-0 21174 1/9996/9996 W 12.46 2718 3 0.0 66.14 66.14
>> 10.241.48.118 free.blah.com GET /js/cookies.js HTTP/1.0
>> 4-0 21175 1/9997/9997 W 12.23 2711 95 0.7 70.41 70.41
>> 10.241.48.147 free.blah.com GET /playXML/track/13356268?r=0.33
>> HTTP/1.0
>> 5-0 21176 1/9993/9993 W 12.59 2233 7 0.0 65.62 65.62
>> 10.241.48.148 free.blah.com GET /js/google_adsense.js HTTP/1.0
>> 6-0 21177 1/10000/10000 W 12.41 2367 15 0.0 64.63 64.63
>> 10.241.48.117 free.blah.com GET /js/global.js HTTP/1.0
>> 7-0 21178 1/9997/9997 W 12.49 2606 3 0.0 63.83 63.83
>> 10.241.48.148 free.blah.com GET /js/table_constructor.js HTTP/1.0
>> 8-0 21179 1/9997/9997 W 12.32 2900 17 0.0 67.04 67.04
>> 10.241.48.117 free.blah.com GET /js/cookies.js HTTP/1.0
>>
>>
>> Below is some stuff I captured from a web server that stopped
>> responding to web requests all together.
>>
>> [ ~]# ps ax |grep httpd
>> 21457 ? Ss 0:01 /usr/local/www/bin/httpd
>> 1414 pts/0 S+ 0:00 grep httpd
>>
>> [ ~]# lsof -p 21457
>> COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
>> httpd 21457 root cwd DIR 8,1 4096 2 /
>> httpd 21457 root rtd DIR 8,1 4096 2 /
>> httpd 21457 root txt REG 8,3 501900 471 /usr1/www/bin/httpd
>> httpd 21457 root mem REG 0,0 0 [vdso]
>> (stat: No such file or directory)
>> httpd 21457 root mem REG 8,1 126648 36049 /lib/ld-2.3.5.so
>> httpd 21457 root mem REG 8,1 1489572 36050 /lib/libc-2.3.5.so
>> httpd 21457 root mem REG 8,1 25476 106215
>> /usr/lib/libgdbm.so.2.0.0
>> httpd 21457 root mem REG 8,1 27660 36057
>> /lib/libcrypt-2.3.5.so
>> httpd 21457 root mem REG 8,1 196676 36058 /lib/libm-2.3.5.so
>> httpd 21457 root mem REG 8,1 125160 106805
>> /usr/lib/libexpat.so.0.5.0
>> httpd 21457 root mem REG 8,1 46552 31966
>> /lib/libnss_files-2.3.5.so
>> httpd 21457 root DEL REG 0,8 983040 /SYSV00000000
>> httpd 21457 root 0r CHR 1,3 1326 /dev/null
>> httpd 21457 root 1w CHR 1,3 1326 /dev/null
>> httpd 21457 root 2w FIFO 0,6 460663664 pipe
>> httpd 21457 root 3r FIFO 0,6 460663693 pipe
>> httpd 21457 root 4w FIFO 0,6 460663664 pipe
>> httpd 21457 root 5w FIFO 0,6 460663666 pipe
>> httpd 21457 root 6w FIFO 0,6 460663668 pipe
>> httpd 21457 root 7w FIFO 0,6 460663673 pipe
>> httpd 21457 root 8w FIFO 0,6 460663674 pipe
>> httpd 21457 root 9w FIFO 0,6 460663693 pipe
>> httpd 21457 root 10r FIFO 0,6 460663696 pipe
>> httpd 21457 root 11w FIFO 0,6 460663696 pipe
>> httpd 21457 root 12r FIFO 0,6 460663699 pipe
>> httpd 21457 root 13w FIFO 0,6 460663699 pipe
>> httpd 21457 root 14r FIFO 0,6 460663702 pipe
>> httpd 21457 root 15u IPv4 460663477 TCP *:http (LISTEN)
>> httpd 21457 root 16w FIFO 0,6 460663702 pipe
>> httpd 21457 root 17r FIFO 0,6 460663705 pipe
>> httpd 21457 root 18w FIFO 0,6 460663705 pipe
>>
>> [ ~]# strace -p 21457
>> Process 21457 attached - interrupt to quit
>> select(0, NULL, NULL, NULL, {0, 540000}) = 0 (Timeout)
>> time(NULL) = 1224806756
>> waitpid(-1, 0xbfc31190, WNOHANG) = 0
>> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
>> time(NULL) = 1224806757
>> waitpid(-1, 0xbfc31190, WNOHANG) = 0
>> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
>> time(NULL) = 1224806758
>> waitpid(-1, 0xbfc31190, WNOHANG) = 0
>> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
>> time(NULL) = 1224806759
>> select(19, NULL, [9 11 13 16 18], NULL, {0, 0}) = 5 (out [9 11 13 16
>> 18], left {0, 0})
>> waitpid(-1, 0xbfc31190, WNOHANG) = 0
>> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
>> time(NULL) = 1224806760
>> waitpid(-1, 0xbfc31190, WNOHANG) = 0
>> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
>> time(NULL) = 1224806761
>> waitpid(-1, 0xbfc31190, WNOHANG) = 0
>> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
>> time(NULL) = 1224806762
>> waitpid(-1, 0xbfc31190, WNOHANG) = 0
>> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
>> time(NULL) = 1224806763
>> waitpid(-1, 0xbfc31190, WNOHANG) = 0
>> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
>> time(NULL) = 1224806764
>> waitpid(-1, 0xbfc31190, WNOHANG) = 0
>> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
>> time(NULL) = 1224806765
>> waitpid(-1, 0xbfc31190, WNOHANG) = 0
>> select(0, NULL, NULL, NULL, {1, 0} <unfinished ...>
>> Process 21457 detached
>>
>> Here is the server info. It is FC4:
>>
>> [ ~]# uname -a
>> Linux web10.sd 2.6.17-1.2142smp #1 SMP Thu Sep 14 15:27:50 PDT 2006
>> i686 i686 i386 GNU/Linux
>>
>> [ ~]# httpd -V
>> Server version: Apache/1.3.34 (Unix)
>> Server built: Aug 14 2006 16:11:23
>> Server's Module Magic Number: 19990320:18
>> Server compiled with....
>> -D HAVE_MMAP
>> -D HAVE_SHMGET
>> -D USE_SHMGET_SCOREBOARD
>> -D USE_MMAP_FILES
>> -D HAVE_FCNTL_SERIALIZED_ACCEPT
>> -D HAVE_SYSVSEM_SERIALIZED_ACCEPT
>> -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
>> -D DYNAMIC_MODULE_LIMIT=64
>> -D HARD_SERVER_LIMIT=256
>> -D HTTPD_ROOT="/usr/local/www"
>> -D SUEXEC_BIN="/usr/local/www/bin/suexec"
>> -D DEFAULT_PIDLOG="logs/httpd.pid"
>> -D DEFAULT_SCOREBOARD="logs/httpd.scoreboard"
>> -D DEFAULT_LOCKFILE="logs/httpd.lock"
>> -D DEFAULT_ERRORLOG="logs/error_log"
>> -D TYPES_CONFIG_FILE="conf/mime.types"
>> -D SERVER_CONFIG_FILE="conf/httpd.conf"
>> -D ACCESS_CONFIG_FILE="conf/access.conf"
>> -D RESOURCE_CONFIG_FILE="conf/srm.conf"
>>
>> Here is the top part of my httpd.conf:
>>
>> #################################
>> ### SECTION 1: Global Environment
>> #################################
>>
>> ServerType standalone
>> Port 80
>> HostnameLookups Off
>> User wwwadmin
>> Group wwwadmin
>>
>>
>> Listen "80"
>> ServerRoot "/usr/local/www"
>> DocumentRoot "/usr/local/www/htdocs"
>>
>> LockFile /var/lock/httpd.lock
>> PidFile logs/httpd.pid
>> ScoreBoardFile logs/apache_runtime_status
>> Timeout 120
>> ExtendedStatus On
>> UseCanonicalName On
>> ServerSignature Off
>> ServerTokens prod
>> UserDir disabled
>>
>> AddDefaultCharset utf-8
>>
>> ###SERVER TUNING###
>>
>> KeepAlive Off
>> MaxKeepAliveRequests 100
>> KeepAliveTimeout 15
>> MinSpareServers 10
>> MaxSpareServers 20
>> StartServers 50
>> MaxClients 125
>> MaxRequestsPerChild 10000
>>
>>
>> If anyone else has seen this behavior before I would appreciate some
>> help. Thanks.
>> ---
>> Jason Cox
>>
>
>
>
> --
> Jason Cox
>
--
Jason Cox
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org