You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2005/08/23 18:40:04 UTC

DO NOT REPLY [Bug 36324] New: - runaway child process stuck in loop in apr_pool_cleanup_kill()

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=36324>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=36324

           Summary: runaway child process stuck in loop in
                    apr_pool_cleanup_kill()
           Product: Apache httpd-2.0
           Version: 2.0.54
          Platform: Sun
        OS/Version: Solaris
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Core
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: stuart@terminus.co.uk


A child process on one of our servers has got stuck eating all available CPU.
We've seen this a couple of times now maybe (not very often given the number of
requests we do). Now we have -g builds as standard, I was able to attach gdb to
it this time and discovered the following:

[Switching to Thread 1 (LWP 1)]
0xff1de3f4 in apr_pool_cleanup_kill (p=0x3cd3c0, data=0x3bf9e8,
cleanup_fn=0xff355588 <pool_bucket_cleanup>) at apr_pools.c:1911
1911    apr_pools.c: No such file or directory.
        in apr_pools.c
(gdb) bt
#0  0xff1de3f4 in apr_pool_cleanup_kill (p=0x3cd3c0, data=0x3bf9e8,
cleanup_fn=0xff355588 <pool_bucket_cleanup>) at apr_pools.c:1911
#1  0xff35571c in pool_bucket_destroy (data=0x3bf9e8) at apr_buckets_pool.c:79
#2  0xff3575c8 in apr_brigade_cleanup (data=0x3deb80) at apr_brigade.c:48
#3  0x0004fabc in send_parsed_content (f=0x3c2488, bb=0x3ed238) at
mod_include.c:3311
#4  0x00050a30 in includes_filter (f=0x3c2488, b=0x3ed1c8) at mod_include.c:3591
#5  0x000c14bc in ap_pass_brigade (next=0x3c2488, bb=0x3ed1c8) at util_filter.c:512
#6  0xfee12548 in ap_headers_output_filter (f=0x3ec078, in=0x3ed1c8) at
mod_headers.c:538
#7  0x000c14bc in ap_pass_brigade (next=0x3ec078, bb=0x3ed1c8) at util_filter.c:512
#8  0xfe365b50 in gu_vignette_ape_out_filter (f=0x3eb608, in_brigade=0x3baff8)
at mod_gu_vignette.c:881
#9  0x000c14bc in ap_pass_brigade (next=0x3eb608, bb=0x3baff8) at util_filter.c:512
#10 0xfed75050 in ap_proxy_http_process_response (p=0x3bab00, r=0x3eb468,
p_conn=0x3bb048, origin=0x3bb230, backend=0x3bb060, conf=0x21a228, bb=0x3baff8, 
    server_portstr=0xffbef1d0 "") at proxy_http.c:937
#11 0xfed757e4 in ap_proxy_http_handler (r=0x3eb468, conf=0x21a228, url=0x3bb150
"/", proxyname=0x0, proxyport=0) at proxy_http.c:1107
#12 0xfedb5e54 in proxy_run_scheme_handler (r=0x3eb468, conf=0x21a228,
url=0x3ec04e "http://ape-liveprod.gul3.gnl:4738/", proxyhost=0x0, proxyport=0)
    at mod_proxy.c:1115
#13 0xfedb3ac0 in proxy_handler (r=0x3eb468) at mod_proxy.c:420
#14 0x000a6b0c in ap_run_handler (r=0x3eb468) at config.c:152
#15 0x000a776c in ap_invoke_handler (r=0x3eb468) at config.c:364
#16 0x000d7db8 in ap_run_sub_req (r=0x3eb468) at request.c:1855
#17 0xfe3640d0 in gu_vignette_handler (r=0x3cd3f8) at mod_gu_vignette.c:471
#18 0x000a6b0c in ap_run_handler (r=0x3cd3f8) at config.c:152
#19 0x000a776c in ap_invoke_handler (r=0x3cd3f8) at config.c:364
#20 0x00085198 in ap_process_request (r=0x3cd3f8) at http_request.c:249
#21 0x0007c5c4 in ap_process_http_connection (c=0x3bac10) at http_core.c:251
#22 0x000bc684 in ap_run_process_connection (c=0x3bac10) at connection.c:43
#23 0x000bccb4 in ap_process_connection (c=0x3bac10, csd=0x3bab38) at
connection.c:176
#24 0x000a4280 in child_main (child_num_arg=83) at prefork.c:610
#25 0x000a44b8 in make_child (s=0x1ce020, slot=83) at prefork.c:704
#26 0x000a48e0 in perform_idle_server_maintenance (p=0x1c90f8) at prefork.c:839
#27 0x000a5000 in ap_mpm_run (_pconf=0x1c90f8, plog=0x2011d8, s=0x1ce020) at
prefork.c:1040
#28 0x000af9d0 in main (argc=3, argv=0xffbefccc) at main.c:618

Steping through, it's just looping round this code:

    while (c) {
        if (c->data == data && c->plain_cleanup_fn == cleanup_fn) {
            *lastp = c->next;
            break;
        }

        lastp = &c->next;
        c = c->next;
    }

So, let's have a look the data we've got:

(gdb) p c
$1 = (cleanup_t *) 0x3ed420
(gdb) p *c
$2 = {next = 0x3c1378, data = 0x3ef428, plain_cleanup_fn = 0x1, child_cleanup_fn
= 0}
(gdb) p *c->next
$3 = {next = 0x3cd3a8, data = 0x3ed420, plain_cleanup_fn = 0x1, child_cleanup_fn
= 0}
(gdb) p *c->next->next
$4 = {next = 0x3ef428, data = 0x3c1378, plain_cleanup_fn = 0x1, child_cleanup_fn
= 0}
(gdb) p *c->next->next->next
$5 = {next = 0x3ed420, data = 0x3cd3a8, plain_cleanup_fn = 0xff355588
<pool_bucket_cleanup>, child_cleanup_fn = 0}

Whoops - the list has turned back on itself, so we're stuck in the loop.
Presumably (from the NULL check), this shouldn't be the case.

This looks related to bug 35974 where we're seeing the address of memory nodes
being corrupt (pointing to invalid addresses which of course result in seg
faults/bus errors on attempted access).

I've ask our systems team to leave that server blocked and the errant process
running, so if you'd like to know the state of anything else please let me know.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org