You are viewing a plain text version of this content. The canonical link for it is here.

Posted to bugs@httpd.apache.org by bu...@apache.org on 2012/10/13 03:30:50 UTC

[Bug 53999] New: Deadlock between procmgr_send_spawn_cmd()/proctable_lock() in mod_fcgid and listener_thread() in worker_mpm

https://issues.apache.org/bugzilla/show_bug.cgi?id=53999

          Priority: P2
            Bug ID: 53999
          Assignee: bugs@httpd.apache.org
           Summary: Deadlock between
                    procmgr_send_spawn_cmd()/proctable_lock() in mod_fcgid
                    and listener_thread() in worker_mpm
          Severity: normal
    Classification: Unclassified
                OS: Linux
          Reporter: zerg2000-apachebug@astral.org.pl
          Hardware: PC
            Status: NEW
           Version: 2.4.3
         Component: mod_fcgid
           Product: Apache httpd-2

Hello!

I would like to merge servers from Apache 2.2 to 2.4 but a deadlock that was
quite rare in 2.2 makes 2.4 unusable. It is between listener_thread() in
worker_mpm and at least these mod_fcgid functions:
procmgr_send_spawn_cmd()/proctable_lock()/proctable_pm_lock(). In Apache 2.2 it
occured only under heavy load while in 2.4 it occurs within few minutes, up to
half hour after server start, even under low server load.

Deadlock results in httpd process terminating itself and leaving many fcgid
processes in busy state that never changes to idle. Busy processes turn into
zombies after some time (possibly after process_kill_gracefully()) so
process_kill_gracefully()/process_kill_force() in scan_busylist() is unable to
kill them and since they are on busy list they are never collected by
scan_idlelist_zombie(). The final result is full saturation of
FcgidMaxProcessesPerClass limit by zombies for some VirtualHosts which in turn
results in permanent 503 error until whole serwer is restarted (can be
graceful).

I am able to reproduce problem on machine with 2 processors by creating 5
VirtualHosts pointing to a separate Joomla installations and then executing the
following test on all VirtualHosts simultaneously:

ab -n 10000 -c 10 vhostX/

Example MPM/mod_fcgid configuration triggering deadlock:

<IfModule worker.c>
    ServerLimit                  3
    ThreadLimit                100
    StartServers                 1
    MaxClients                 300
    MinSpareThreads             50
    MaxSpareThreads            175
    ThreadsPerChild            100
    MaxConnectionsPerChild  100000
</IfModule>
<IfModule mod_fcgid.c>
   FcgidMaxProcesses 400
   FcgidMinProcessesPerClass 0
   FcgidMaxProcessesPerClass 5
   FcgidMaxRequestsPerProcess 500
   FcgidInitialEnv PHP_FCGI_MAX_REQUESTS 500
   FcgidProcessLifeTime 135
   FcgidSpawnScore 1
   FcgidTerminationScore 1
   FcgidTimeScore 100
   FcgidSpawnScoreUpLimit 100
   FcgidIOTimeout 65
   FcgidIdleTimeout 60
   FcgidIdleScanInterval 30
   FcgidBusyTimeout 65
   FcgidBusyScanInterval 60
</IfModule>

System: Debian 6.0.x 64 bit
Apache: 2.4.3
mod_fcgid: SVN rev. 1397462 (bug also present in 2.3.7 and 2.3.6)
Compilation flags: -O0 -ggdb

Deadlock is usually triggered within 5 minutes after starting ab.

----------[ Debug info from deadlock on the worker_mpm side ]----------
Error in logfile:

[mpm_worker:emerg] [pid 13326:tid 139649122420480] (35)Resource deadlock
avoided: AH00273: apr_proc_mutex_lock failed. Attempting to shutdown process
gracefully.

Deadlock in listener_thread() at worker.c:762:

        /* We've already decremented the idle worker count inside               
         * ap_queue_info_wait_for_idler. */

====>   if ((rv = SAFE_ACCEPT(apr_proc_mutex_lock(accept_mutex)))
            != APR_SUCCESS) {

            if (!listener_may_exit) {
                accept_mutex_error("lock", rv, process_slot);
            }
            break;                    /* skip the lock release */
        }

Backtrace:

#0  0x00007f5003d9f45c in __pthread_kill (threadid=<value optimized out>,
signo=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:63
#1  0x000000000047bf64 in wakeup_listener () at worker.c:287
#2  0x000000000047bf97 in signal_threads (mode=1) at worker.c:310
#3  0x000000000047cd7c in accept_mutex_error (func=0x4929b8 "lock", rv=35,
process_slot=0) at worker.c:678
#4  0x000000000047d0b8 in listener_thread (thd=0x21ccb18, dummy=0x209b290) at
worker.c:766
#5  0x00007f5003fdbc03 in ?? () from /usr/lib/libapr-1.so.0
#6  0x00007f5003d998ca in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#7  0x00007f5003b0092d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8  0x0000000000000000 in ?? ()

Local variables:

ti = 0x209b290
process_slot = 0
tpool = 0x21edb18
csd = 0x7f4fc4086d30
ptrans = 0x7f4fc4018a68
pollset = 0x21edb90
rv = 35                  <=== EDEADLK
lr = 0x1e97178
have_idle_worker = 1
last_poll_idx = 1

----------[ Debug info from deadlock on the mod_fcgid side ]----------
Error in logfile:

[fcgid:emerg] [pid 11624:tid 139810533033728] (35)Resource deadlock avoided:
[client x.x.x.x:49337] mod_fcgid: can't get pipe mutex

There is also another one in proctable_lock()/proctable_pm_lock() but it is
quite rare and I didn't get backtrace from it:

[fcgid:emerg] [pid 27202:tid 140334891595520] (35)Resource deadlock avoided:
[client x.x.x.x:33273] mod_fcgid: can't lock process table in pid 27202

Deadlock in procmgr_send_spawn_cmd() at fcgid_pm_unix.c:467:

         /* Get the global mutex before posting the request */
====>    if ((rv = apr_global_mutex_lock(g_pipelock)) != APR_SUCCESS) {
             ap_log_rerror(APLOG_MARK, APLOG_EMERG, rv, r,
                           "mod_fcgid: can't get pipe mutex");
             exit(0);
         }

Backtrace:
#0  procmgr_send_spawn_cmd (command=0x7f5311e0b520, r=0x1403738) at
fcgid_pm_unix.c:468
#1  0x00007f532808082b in handle_request (r=0x1403738, role=1,
cmd_conf=0x13fa380, output_brigade=0x1404f70) at fcgid_bridge.c:450
#2  0x00007f5328081769 in bridge_request (r=0x1403738, role=1,
cmd_conf=0x13fa380) at fcgid_bridge.c:765
#3  0x00007f532807df92 in fcgid_handler (r=0x1403738) at mod_fcgid.c:290
#4  0x000000000045662b in ap_run_handler (r=0x1403738) at config.c:169
#5  0x0000000000456f0b in ap_invoke_handler (r=0x1403738) at config.c:432
#6  0x0000000000473963 in ap_internal_redirect_handler (new_uri=0x1403718
"/cgi-bin/php-fcgi/index.php", r=0x1576880) at http_request.c:669
#7  0x00007f53288ad394 in action_handler (r=0x1576880) at mod_actions.c:205
#8  0x000000000045662b in ap_run_handler (r=0x1576880) at config.c:169
#9  0x0000000000456f0b in ap_invoke_handler (r=0x1576880) at config.c:432
#10 0x00000000004727b3 in ap_process_async_request (r=0x1576880) at
http_request.c:317
#11 0x0000000000472899 in ap_process_request (r=0x1576880) at
http_request.c:363
#12 0x000000000046ed81 in ap_process_http_sync_connection (c=0x13ef3b8) at
http_core.c:190
#13 0x000000000046ee97 in ap_process_http_connection (c=0x13ef3b8) at
http_core.c:231
#14 0x0000000000463bf2 in ap_run_process_connection (c=0x13ef3b8) at
connection.c:41
#15 0x00000000004640bf in ap_process_connection (c=0x13ef3b8, csd=0x13ef1a0) at
connection.c:202
#16 0x000000000047ca2c in process_socket (thd=0x1099478, p=0x13ef118,
sock=0x13ef1a0, my_child_num=1, my_thread_num=42, bucket_alloc=0x13f1128)
    at worker.c:620
#17 0x000000000047d8a9 in worker_thread (thd=0x1099478, dummy=0x12056b0) at
worker.c:979
#18 0x00007f532dfc4c03 in ?? () from /usr/lib/libapr-1.so.0
#19 0x00007f532dd828ca in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#20 0x00007f532dae992d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#21 0x0000000000000000 in ?? ()

Local variables:

rv = 35                <=== EDEADLK
result = 0
notifybyte = 0 '\000'
nbytes = 13160


procmgr_send_spawn_cmd() executes exit(0) on error when trying to lock mutex -
shouldn't it return status code that indicates error? There is also lack of
information that httpd process is actually terminated - "mod_fcgid: can't get
pipe mutex" in logs does not look too harmful except for the emerg level.

Should I fill separate bug report for busy zombies not being collected?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53999] Deadlock between procmgr_send_spawn_cmd()/proctable_lock() in mod_fcgid and listener_thread() in worker_mpm

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53999

Bartosz Kwitniewski <ze...@astral.org.pl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #2 from Bartosz Kwitniewski <ze...@astral.org.pl> ---
I'm unable to reproduce error on Linux 3.16.6 and Apache 2.4.10 anymore. I have
received report that kernel upgrade in itself can fix this bug. Closing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53999] Deadlock between procmgr_send_spawn_cmd()/proctable_lock() in mod_fcgid and listener_thread() in worker_mpm

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53999

Bartosz Kwitniewski <ze...@astral.org.pl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|2.4.3                       |2.4.4

--- Comment #1 from Bartosz Kwitniewski <ze...@astral.org.pl> ---
I was unable to reproduce this bug with event MPM using unencrypted connections
but it is still present when using SSL, probably due to this: "For SSL
connections, this MPM will fall back to the behaviour of the worker MPM and
reserve one worker thread per connection."

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org