You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Jason Cox <cs...@gmail.com> on 2008/11/05 01:53:45 UTC

[users@httpd] Re: High "sending reply" count, server stops responding

Bump....

On Thu, Oct 23, 2008 at 6:10 PM, Jason Cox <cs...@gmail.com> wrote:
> I am seeing this issue on my web servers. It all started a couple of
> weeks ago. I looked at my change logs and see no changes that would
> correlate to this happening.
> So this is what happens. A process reaches the MaxRequestsPerChild, in
> this case 10000, it dies off but the parent process still thinks it is
> running. When I look at the server-status page the processes are in a
> "Sending Reply" state.
>
> Here is a score board from one of my servers:
>
> Scoreboard: WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW_____________________W_WWW_________________....................................................................................................................................................................
>
> Here is a small part of what I see on the server-status page:
>
> 0-0     21171   1/9999/9999     W       12.31   3081    15      0.0     76.34   76.34
>        10.241.48.117   www.blah.com    GET /js/s_code_remote.js HTTP/1.0
> 1-0     21172   1/9995/9995     W       12.65   2719    33      0.0     65.51   65.51
>        10.241.48.149   free.blah.com   GET /js/player.js HTTP/1.0
> 2-0     21173   1/9996/9996     W       12.37   1725    17      0.0     67.84   67.84
>        10.241.48.119   free.blah.com   GET /js/site_catalyst.js HTTP/1.0
> 3-0     21174   1/9996/9996     W       12.46   2718    3       0.0     66.14   66.14
>        10.241.48.118   free.blah.com   GET /js/cookies.js HTTP/1.0
> 4-0     21175   1/9997/9997     W       12.23   2711    95      0.7     70.41   70.41
>        10.241.48.147   free.blah.com   GET /playXML/track/13356268?r=0.33
> HTTP/1.0
> 5-0     21176   1/9993/9993     W       12.59   2233    7       0.0     65.62   65.62
>        10.241.48.148   free.blah.com   GET /js/google_adsense.js HTTP/1.0
> 6-0     21177   1/10000/10000   W       12.41   2367    15      0.0     64.63   64.63
>        10.241.48.117   free.blah.com   GET /js/global.js HTTP/1.0
> 7-0     21178   1/9997/9997     W       12.49   2606    3       0.0     63.83   63.83
>        10.241.48.148   free.blah.com   GET /js/table_constructor.js HTTP/1.0
> 8-0     21179   1/9997/9997     W       12.32   2900    17      0.0     67.04   67.04
>        10.241.48.117   free.blah.com   GET /js/cookies.js HTTP/1.0
>
>
> Below is some stuff I captured from a web server that stopped
> responding to web requests all together.
>
> [ ~]# ps ax |grep httpd
> 21457 ?        Ss     0:01 /usr/local/www/bin/httpd
>  1414 pts/0    S+     0:00 grep httpd
>
> [ ~]# lsof -p 21457
> COMMAND   PID USER   FD   TYPE    DEVICE    SIZE      NODE NAME
> httpd   21457 root  cwd    DIR       8,1    4096         2 /
> httpd   21457 root  rtd    DIR       8,1    4096         2 /
> httpd   21457 root  txt    REG       8,3  501900       471 /usr1/www/bin/httpd
> httpd   21457 root  mem    REG       0,0                 0 [vdso]
> (stat: No such file or directory)
> httpd   21457 root  mem    REG       8,1  126648     36049 /lib/ld-2.3.5.so
> httpd   21457 root  mem    REG       8,1 1489572     36050 /lib/libc-2.3.5.so
> httpd   21457 root  mem    REG       8,1   25476    106215
> /usr/lib/libgdbm.so.2.0.0
> httpd   21457 root  mem    REG       8,1   27660     36057
> /lib/libcrypt-2.3.5.so
> httpd   21457 root  mem    REG       8,1  196676     36058 /lib/libm-2.3.5.so
> httpd   21457 root  mem    REG       8,1  125160    106805
> /usr/lib/libexpat.so.0.5.0
> httpd   21457 root  mem    REG       8,1   46552     31966
> /lib/libnss_files-2.3.5.so
> httpd   21457 root  DEL    REG       0,8            983040 /SYSV00000000
> httpd   21457 root    0r   CHR       1,3              1326 /dev/null
> httpd   21457 root    1w   CHR       1,3              1326 /dev/null
> httpd   21457 root    2w  FIFO       0,6         460663664 pipe
> httpd   21457 root    3r  FIFO       0,6         460663693 pipe
> httpd   21457 root    4w  FIFO       0,6         460663664 pipe
> httpd   21457 root    5w  FIFO       0,6         460663666 pipe
> httpd   21457 root    6w  FIFO       0,6         460663668 pipe
> httpd   21457 root    7w  FIFO       0,6         460663673 pipe
> httpd   21457 root    8w  FIFO       0,6         460663674 pipe
> httpd   21457 root    9w  FIFO       0,6         460663693 pipe
> httpd   21457 root   10r  FIFO       0,6         460663696 pipe
> httpd   21457 root   11w  FIFO       0,6         460663696 pipe
> httpd   21457 root   12r  FIFO       0,6         460663699 pipe
> httpd   21457 root   13w  FIFO       0,6         460663699 pipe
> httpd   21457 root   14r  FIFO       0,6         460663702 pipe
> httpd   21457 root   15u  IPv4 460663477               TCP *:http (LISTEN)
> httpd   21457 root   16w  FIFO       0,6         460663702 pipe
> httpd   21457 root   17r  FIFO       0,6         460663705 pipe
> httpd   21457 root   18w  FIFO       0,6         460663705 pipe
>
> [ ~]# strace -p 21457
> Process 21457 attached - interrupt to quit
> select(0, NULL, NULL, NULL, {0, 540000}) = 0 (Timeout)
> time(NULL)                              = 1224806756
> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> time(NULL)                              = 1224806757
> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> time(NULL)                              = 1224806758
> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> time(NULL)                              = 1224806759
> select(19, NULL, [9 11 13 16 18], NULL, {0, 0}) = 5 (out [9 11 13 16
> 18], left {0, 0})
> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> time(NULL)                              = 1224806760
> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> time(NULL)                              = 1224806761
> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> time(NULL)                              = 1224806762
> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> time(NULL)                              = 1224806763
> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> time(NULL)                              = 1224806764
> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> time(NULL)                              = 1224806765
> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
> select(0, NULL, NULL, NULL, {1, 0} <unfinished ...>
> Process 21457 detached
>
> Here is the server info. It is FC4:
>
> [ ~]# uname -a
> Linux web10.sd  2.6.17-1.2142smp #1 SMP Thu Sep 14 15:27:50 PDT 2006
> i686 i686 i386 GNU/Linux
>
> [ ~]# httpd -V
> Server version: Apache/1.3.34 (Unix)
> Server built:   Aug 14 2006 16:11:23
> Server's Module Magic Number: 19990320:18
> Server compiled with....
>  -D HAVE_MMAP
>  -D HAVE_SHMGET
>  -D USE_SHMGET_SCOREBOARD
>  -D USE_MMAP_FILES
>  -D HAVE_FCNTL_SERIALIZED_ACCEPT
>  -D HAVE_SYSVSEM_SERIALIZED_ACCEPT
>  -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
>  -D DYNAMIC_MODULE_LIMIT=64
>  -D HARD_SERVER_LIMIT=256
>  -D HTTPD_ROOT="/usr/local/www"
>  -D SUEXEC_BIN="/usr/local/www/bin/suexec"
>  -D DEFAULT_PIDLOG="logs/httpd.pid"
>  -D DEFAULT_SCOREBOARD="logs/httpd.scoreboard"
>  -D DEFAULT_LOCKFILE="logs/httpd.lock"
>  -D DEFAULT_ERRORLOG="logs/error_log"
>  -D TYPES_CONFIG_FILE="conf/mime.types"
>  -D SERVER_CONFIG_FILE="conf/httpd.conf"
>  -D ACCESS_CONFIG_FILE="conf/access.conf"
>  -D RESOURCE_CONFIG_FILE="conf/srm.conf"
>
> Here is the top part of my httpd.conf:
>
> #################################
> ### SECTION 1: Global Environment
> #################################
>
> ServerType              standalone
> Port                    80
> HostnameLookups         Off
> User                    wwwadmin
> Group                   wwwadmin
>
>
> Listen                  "80"
> ServerRoot              "/usr/local/www"
> DocumentRoot            "/usr/local/www/htdocs"
>
> LockFile                /var/lock/httpd.lock
> PidFile                 logs/httpd.pid
> ScoreBoardFile          logs/apache_runtime_status
> Timeout                 120
> ExtendedStatus          On
> UseCanonicalName        On
> ServerSignature         Off
> ServerTokens            prod
> UserDir                 disabled
>
> AddDefaultCharset utf-8
>
> ###SERVER TUNING###
>
> KeepAlive               Off
> MaxKeepAliveRequests    100
> KeepAliveTimeout        15
> MinSpareServers         10
> MaxSpareServers         20
> StartServers            50
> MaxClients              125
> MaxRequestsPerChild     10000
>
>
> If anyone else has seen this behavior before I would appreciate some
> help. Thanks.
> ---
> Jason Cox
>



-- 
Jason Cox

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


[users@httpd] Re: High "sending reply" count, server stops responding

Posted by Jason Cox <cs...@gmail.com>.
Wow so no help? No one has any ideas on what might be going on? Is
this even the correct forum to try and get help? Would it help if I
mentioned I can not duplicate this in my DEV nor QA server
environment? That only the production servers are seeing this issue
and the only difference from Prod and QA/DEV is the server hardware? I
wanted to leave upgradng apache as my last resort, but if I can not
get help I guess that is the only thing to try and fix it right now.

On Tue, Nov 4, 2008 at 4:53 PM, Jason Cox <cs...@gmail.com> wrote:
> Bump....
>
> On Thu, Oct 23, 2008 at 6:10 PM, Jason Cox <cs...@gmail.com> wrote:
>> I am seeing this issue on my web servers. It all started a couple of
>> weeks ago. I looked at my change logs and see no changes that would
>> correlate to this happening.
>> So this is what happens. A process reaches the MaxRequestsPerChild, in
>> this case 10000, it dies off but the parent process still thinks it is
>> running. When I look at the server-status page the processes are in a
>> "Sending Reply" state.
>>
>> Here is a score board from one of my servers:
>>
>> Scoreboard: WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW_____________________W_WWW_________________....................................................................................................................................................................
>>
>> Here is a small part of what I see on the server-status page:
>>
>> 0-0     21171   1/9999/9999     W       12.31   3081    15      0.0     76.34   76.34
>>        10.241.48.117   www.blah.com    GET /js/s_code_remote.js HTTP/1.0
>> 1-0     21172   1/9995/9995     W       12.65   2719    33      0.0     65.51   65.51
>>        10.241.48.149   free.blah.com   GET /js/player.js HTTP/1.0
>> 2-0     21173   1/9996/9996     W       12.37   1725    17      0.0     67.84   67.84
>>        10.241.48.119   free.blah.com   GET /js/site_catalyst.js HTTP/1.0
>> 3-0     21174   1/9996/9996     W       12.46   2718    3       0.0     66.14   66.14
>>        10.241.48.118   free.blah.com   GET /js/cookies.js HTTP/1.0
>> 4-0     21175   1/9997/9997     W       12.23   2711    95      0.7     70.41   70.41
>>        10.241.48.147   free.blah.com   GET /playXML/track/13356268?r=0.33
>> HTTP/1.0
>> 5-0     21176   1/9993/9993     W       12.59   2233    7       0.0     65.62   65.62
>>        10.241.48.148   free.blah.com   GET /js/google_adsense.js HTTP/1.0
>> 6-0     21177   1/10000/10000   W       12.41   2367    15      0.0     64.63   64.63
>>        10.241.48.117   free.blah.com   GET /js/global.js HTTP/1.0
>> 7-0     21178   1/9997/9997     W       12.49   2606    3       0.0     63.83   63.83
>>        10.241.48.148   free.blah.com   GET /js/table_constructor.js HTTP/1.0
>> 8-0     21179   1/9997/9997     W       12.32   2900    17      0.0     67.04   67.04
>>        10.241.48.117   free.blah.com   GET /js/cookies.js HTTP/1.0
>>
>>
>> Below is some stuff I captured from a web server that stopped
>> responding to web requests all together.
>>
>> [ ~]# ps ax |grep httpd
>> 21457 ?        Ss     0:01 /usr/local/www/bin/httpd
>>  1414 pts/0    S+     0:00 grep httpd
>>
>> [ ~]# lsof -p 21457
>> COMMAND   PID USER   FD   TYPE    DEVICE    SIZE      NODE NAME
>> httpd   21457 root  cwd    DIR       8,1    4096         2 /
>> httpd   21457 root  rtd    DIR       8,1    4096         2 /
>> httpd   21457 root  txt    REG       8,3  501900       471 /usr1/www/bin/httpd
>> httpd   21457 root  mem    REG       0,0                 0 [vdso]
>> (stat: No such file or directory)
>> httpd   21457 root  mem    REG       8,1  126648     36049 /lib/ld-2.3.5.so
>> httpd   21457 root  mem    REG       8,1 1489572     36050 /lib/libc-2.3.5.so
>> httpd   21457 root  mem    REG       8,1   25476    106215
>> /usr/lib/libgdbm.so.2.0.0
>> httpd   21457 root  mem    REG       8,1   27660     36057
>> /lib/libcrypt-2.3.5.so
>> httpd   21457 root  mem    REG       8,1  196676     36058 /lib/libm-2.3.5.so
>> httpd   21457 root  mem    REG       8,1  125160    106805
>> /usr/lib/libexpat.so.0.5.0
>> httpd   21457 root  mem    REG       8,1   46552     31966
>> /lib/libnss_files-2.3.5.so
>> httpd   21457 root  DEL    REG       0,8            983040 /SYSV00000000
>> httpd   21457 root    0r   CHR       1,3              1326 /dev/null
>> httpd   21457 root    1w   CHR       1,3              1326 /dev/null
>> httpd   21457 root    2w  FIFO       0,6         460663664 pipe
>> httpd   21457 root    3r  FIFO       0,6         460663693 pipe
>> httpd   21457 root    4w  FIFO       0,6         460663664 pipe
>> httpd   21457 root    5w  FIFO       0,6         460663666 pipe
>> httpd   21457 root    6w  FIFO       0,6         460663668 pipe
>> httpd   21457 root    7w  FIFO       0,6         460663673 pipe
>> httpd   21457 root    8w  FIFO       0,6         460663674 pipe
>> httpd   21457 root    9w  FIFO       0,6         460663693 pipe
>> httpd   21457 root   10r  FIFO       0,6         460663696 pipe
>> httpd   21457 root   11w  FIFO       0,6         460663696 pipe
>> httpd   21457 root   12r  FIFO       0,6         460663699 pipe
>> httpd   21457 root   13w  FIFO       0,6         460663699 pipe
>> httpd   21457 root   14r  FIFO       0,6         460663702 pipe
>> httpd   21457 root   15u  IPv4 460663477               TCP *:http (LISTEN)
>> httpd   21457 root   16w  FIFO       0,6         460663702 pipe
>> httpd   21457 root   17r  FIFO       0,6         460663705 pipe
>> httpd   21457 root   18w  FIFO       0,6         460663705 pipe
>>
>> [ ~]# strace -p 21457
>> Process 21457 attached - interrupt to quit
>> select(0, NULL, NULL, NULL, {0, 540000}) = 0 (Timeout)
>> time(NULL)                              = 1224806756
>> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>> time(NULL)                              = 1224806757
>> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>> time(NULL)                              = 1224806758
>> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>> time(NULL)                              = 1224806759
>> select(19, NULL, [9 11 13 16 18], NULL, {0, 0}) = 5 (out [9 11 13 16
>> 18], left {0, 0})
>> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>> time(NULL)                              = 1224806760
>> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>> time(NULL)                              = 1224806761
>> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>> time(NULL)                              = 1224806762
>> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>> time(NULL)                              = 1224806763
>> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>> time(NULL)                              = 1224806764
>> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>> time(NULL)                              = 1224806765
>> waitpid(-1, 0xbfc31190, WNOHANG)        = 0
>> select(0, NULL, NULL, NULL, {1, 0} <unfinished ...>
>> Process 21457 detached
>>
>> Here is the server info. It is FC4:
>>
>> [ ~]# uname -a
>> Linux web10.sd  2.6.17-1.2142smp #1 SMP Thu Sep 14 15:27:50 PDT 2006
>> i686 i686 i386 GNU/Linux
>>
>> [ ~]# httpd -V
>> Server version: Apache/1.3.34 (Unix)
>> Server built:   Aug 14 2006 16:11:23
>> Server's Module Magic Number: 19990320:18
>> Server compiled with....
>>  -D HAVE_MMAP
>>  -D HAVE_SHMGET
>>  -D USE_SHMGET_SCOREBOARD
>>  -D USE_MMAP_FILES
>>  -D HAVE_FCNTL_SERIALIZED_ACCEPT
>>  -D HAVE_SYSVSEM_SERIALIZED_ACCEPT
>>  -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
>>  -D DYNAMIC_MODULE_LIMIT=64
>>  -D HARD_SERVER_LIMIT=256
>>  -D HTTPD_ROOT="/usr/local/www"
>>  -D SUEXEC_BIN="/usr/local/www/bin/suexec"
>>  -D DEFAULT_PIDLOG="logs/httpd.pid"
>>  -D DEFAULT_SCOREBOARD="logs/httpd.scoreboard"
>>  -D DEFAULT_LOCKFILE="logs/httpd.lock"
>>  -D DEFAULT_ERRORLOG="logs/error_log"
>>  -D TYPES_CONFIG_FILE="conf/mime.types"
>>  -D SERVER_CONFIG_FILE="conf/httpd.conf"
>>  -D ACCESS_CONFIG_FILE="conf/access.conf"
>>  -D RESOURCE_CONFIG_FILE="conf/srm.conf"
>>
>> Here is the top part of my httpd.conf:
>>
>> #################################
>> ### SECTION 1: Global Environment
>> #################################
>>
>> ServerType              standalone
>> Port                    80
>> HostnameLookups         Off
>> User                    wwwadmin
>> Group                   wwwadmin
>>
>>
>> Listen                  "80"
>> ServerRoot              "/usr/local/www"
>> DocumentRoot            "/usr/local/www/htdocs"
>>
>> LockFile                /var/lock/httpd.lock
>> PidFile                 logs/httpd.pid
>> ScoreBoardFile          logs/apache_runtime_status
>> Timeout                 120
>> ExtendedStatus          On
>> UseCanonicalName        On
>> ServerSignature         Off
>> ServerTokens            prod
>> UserDir                 disabled
>>
>> AddDefaultCharset utf-8
>>
>> ###SERVER TUNING###
>>
>> KeepAlive               Off
>> MaxKeepAliveRequests    100
>> KeepAliveTimeout        15
>> MinSpareServers         10
>> MaxSpareServers         20
>> StartServers            50
>> MaxClients              125
>> MaxRequestsPerChild     10000
>>
>>
>> If anyone else has seen this behavior before I would appreciate some
>> help. Thanks.
>> ---
>> Jason Cox
>>
>
>
>
> --
> Jason Cox
>



-- 
Jason Cox

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org