You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2015/10/16 22:23:09 UTC

[Bug 58503] New: segfault in apr_brigade_cleanup() under high load with slow clients

https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

            Bug ID: 58503
           Summary: segfault in apr_brigade_cleanup() under high load with
                    slow clients
           Product: Apache httpd-2
           Version: 2.4.16
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: critical
          Priority: P2
         Component: mpm_worker
          Assignee: bugs@httpd.apache.org
          Reporter: s.tabacchiera@gmail.com

apache 2.4.17 on RHEL 6.3 64bit

SCENARIO: 
Client -> apache/mod_proxy_wstunnel -> tomcat

Under high load (multiple busy threads), with many slow clients (expeciallu
mobile users during roaming, who often go offline), we experience most threads
are waiting for clients to read() from their side, netstat showing a large
Send-Q, when clients abort or apache Timeout expires, httpd core dumps with
segfault.

gdb bt:
#0  0x00007f69ec6abe17 in apr_brigade_cleanup (data=0x7f692646ffb8) at
buckets/apr_brigade.c:44
#1  0x00007f69ec265b1e in run_cleanups (pool=0x7f692646e028) at
memory/unix/apr_pools.c:2352
#2  apr_pool_destroy (pool=0x7f692646e028) at memory/unix/apr_pools.c:814
#3  0x00007f69ec6abe2c in apr_brigade_cleanup (data=0x7f6926489860) at
buckets/apr_brigade.c:44
#4  0x0000000000448dda in ap_core_output_filter (f=0x7f6926489690,
new_bb=0x7f6926489860) at core_filters.c:385
#5  0x0000000000463fa3 in ap_process_request_after_handler (r=<value optimized
out>) at http_request.c:267
#6  0x00000000004645af in ap_process_request (r=0x7f692646e0a0) at
http_request.c:373
#7  0x00000000004608ee in ap_process_http_sync_connection (c=0x7f69264892b8) at
http_core.c:210
#8  ap_process_http_connection (c=0x7f69264892b8) at http_core.c:251
#9  0x0000000000457fc0 in ap_run_process_connection (c=0x7f69264892b8) at
connection.c:41
#10 0x000000000046b468 in process_socket (thd=0x7f69e6963f28, dummy=<value
optimized out>) at worker.c:631
#11 worker_thread (thd=0x7f69e6963f28, dummy=<value optimized out>) at
worker.c:990
#12 0x0000003f3a807a51 in start_thread () from /lib64/libpthread.so.0
#13 0x0000003f3a0e893d in clone () from /lib64/libc.so.6

ENVIRONMENT:
RHEL 6.3 64bit on vmware
Apache 2.4.17 worker, with apr-1.5.2, apr-util-1.5.4, openssl 1.0.1p.

Same issue occurring with 2.4.16, 2.2.31, 2.2.29, with and w/o ssl.
Same issue occurring with event mpm.
Issue NOT OCCURRING with prefork mpm.

It seems a thread-synchronization issue.

HOW TO REPRODUCE:
Just start a number of clients who, after the initial websocket request, read
very slowly. Let the Send-Q fills to ~128KB, with high thread-per-child
concurrency. Then let all clients abort simultaneously, or let the Timeout
expires. Apache child exit with signal (11) and core dumps.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

Bernard Cafarelli <vo...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |voyageursp@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #16 from Yann Ylavic <yl...@gmail.com> ---
(In reply to Yann Ylavic from comment #15)
> Thanks ST for this report and follow up, a different fix was backported to
> upcoming 2.4.19 in r1732266, so I'm closing here.
> 
> If you manage to test the final fix in your enviroment and have any issue
> with it, please reopen.

Here is the link of the patch (all in one) which was backported (and hence
could be applied to 2.4.18):
http://home.apache.org/~rpluem/patches/proxy_correctlifetime.diff

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #14 from Tabby <s....@gmail.com> ---
Hi there,
any news about this patch? It was not included in 2.4.18, so I'm wondering
if/when will it ever be.

Thanks!
ST

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

Tabby <s....@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |s.tabacchiera@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

Yann Ylavic <yl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #15 from Yann Ylavic <yl...@gmail.com> ---
Thanks ST for this report and follow up, a different fix was backported to
upcoming 2.4.19 in r1732266, so I'm closing here.

If you manage to test the final fix in your enviroment and have any issue with
it, please reopen.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #8 from Yann Ylavic <yl...@gmail.com> ---
(In reply to Ruediger Pluem from comment #7)
> Actually I think mod_proxy_wstunnel falls into the same pitfall
> mod_proxy_http was in and it needs to do the same / something similar then
> mod_proxy_http.c with
> 
> proxy_buckets_lifetime_transform

Yes I agree, just proposed a quick patch to determine whether it came from
mod_proxy_wstunnel or some failure in the core deferred/pipelined write
logic...

We need to either use lifetime_transform like in proxy_http, or I was thinking
of modifying all the input filters that create their buckets on f->r/c's
pool/bucket_alloc so that they now use their given bb->p and bb->bucket_alloc.

By doing the latter, we wouldn't have to transform the lifetime, it would be
determined by the caller...

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #11 from Ruediger Pluem <rp...@apache.org> ---
(In reply to Tabby from comment #10)
> (In reply to Ruediger Pluem from comment #6)
> > While it is nice that the patch works for you I suspect that it hides
> > something else. Can you please do a 
> > 
> > print e
> > print *e
> > 
> > in gdb in the thread that causes the segfault?
> 
> Sorry, I've deleted the old core.
> 

> 
> (gdb) p e
> $1 = (apr_bucket *) 0x7efee4018348
> 
> (gdb) p *e
> $2 = {link = {next = 0x2c6c6c756e3a226e, prev = 0x4974616d726f6622}, type =
> 0x4f444e49223a2264, length = 3180190329187026254, start =
> 6229711307259471394, 
>   data = 0x2d223a2265756c61, free = 0x7b2c7d2235322e31, list =
> 0x6e6f697372657622}

Thanks. What does

print *(e->type)
x/10x 0x2c6c6c756e3a226e,
x/10x 0x4974616d726f6622
x/10x 0x7b2c7d2235322e31
dump_brigade b
dump_bucket e

deliver?
Keep in mind that you need to have the macros .gdbinit (top level of source
directory) loaded for the dump_* commands to work.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #7 from Ruediger Pluem <rp...@apache.org> ---
Actually I think mod_proxy_wstunnel falls into the same pitfall mod_proxy_http
was in and it needs to do the same / something similar then mod_proxy_http.c
with

proxy_buckets_lifetime_transform

Just have no further time investigating right now.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #13 from Tabby <s....@gmail.com> ---
After 1 month I can confirm the issue is gone.
Will this patch be included in the upcoming release?

Thanks
ST

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

Yann Ylavic <yl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ylavic.dev@gmail.com

--- Comment #2 from Yann Ylavic <yl...@gmail.com> ---
Created attachment 33184
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33184&action=edit
cleanup mod_proxy_wstunnel's backend brigades

Can you please try this patch?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #1 from Tabby <s....@gmail.com> ---
*** Bug 58502 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #10 from Tabby <s....@gmail.com> ---
(In reply to Ruediger Pluem from comment #6)
> While it is nice that the patch works for you I suspect that it hides
> something else. Can you please do a 
> 
> print e
> print *e
> 
> in gdb in the thread that causes the segfault?

Sorry, I've deleted the old core.

Consider this new one, with ssl enabled:
#0  0x00007eff54952e17 in apr_brigade_cleanup (data=0x7eff2c01c2c8) at
buckets/apr_brigade.c:44
#1  0x00007eff5450caee in run_cleanups (pool=0x7eff2c01a2b8) at
memory/unix/apr_pools.c:2352
#2  apr_pool_destroy (pool=0x7eff2c01a2b8) at memory/unix/apr_pools.c:814
#3  0x00007eff54952e2c in apr_brigade_cleanup (data=0x7eff2c0062a8) at
buckets/apr_brigade.c:44
#4  0x00007eff5165db0e in ssl_io_filter_output (f=0x7eff480567f8,
bb=0x7eff2c0062a8) at ssl_engine_io.c:1612
#5  0x00007eff5165e125 in ssl_io_filter_coalesce (f=0x7eff480567d0,
bb=0x7eff2c0062a8) at ssl_engine_io.c:1599
#6  0x0000000000463fa3 in ap_process_request_after_handler (r=<value optimized
out>) at http_request.c:267
#7  0x00000000004645af in ap_process_request (r=0x7eff2c01a330) at
http_request.c:373
#8  0x00000000004608ee in ap_process_http_sync_connection (c=0x7eff48056388) at
http_core.c:210
#9  ap_process_http_connection (c=0x7eff48056388) at http_core.c:251
#10 0x0000000000457fc0 in ap_run_process_connection (c=0x7eff48056388) at
connection.c:41
#11 0x000000000046b468 in process_socket (thd=0xfa9928, dummy=<value optimized
out>) at worker.c:631
#12 worker_thread (thd=0xfa9928, dummy=<value optimized out>) at worker.c:990
#13 0x0000003f3a807a51 in start_thread () from /lib64/libpthread.so.0
#14 0x0000003f3a0e893d in clone () from /lib64/libc.so.6

(gdb) p e
$1 = (apr_bucket *) 0x7efee4018348

(gdb) p *e
$2 = {link = {next = 0x2c6c6c756e3a226e, prev = 0x4974616d726f6622}, type =
0x4f444e49223a2264, length = 3180190329187026254, start = 6229711307259471394, 
  data = 0x2d223a2265756c61, free = 0x7b2c7d2235322e31, list =
0x6e6f697372657622}

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #12 from Tabby <s....@gmail.com> ---
(gdb) p *(e->type)
Cannot access memory at address 0x4f444e49223a2264

(gdb) x/10x 0x2c6c6c756e3a226e
0x2c6c6c756e3a226e:     Cannot access memory at address 0x2c6c6c756e3a226e

(gdb) x/10x 0x4974616d726f6622
0x4974616d726f6622:     Cannot access memory at address 0x4974616d726f6622

(gdb) x/10x 0x7b2c7d2235322e31
0x7b2c7d2235322e31:     Cannot access memory at address 0x7b2c7d2235322e31

(gdb) dump_brigade b
dump of brigade 0x7eff2c01c2c8
   | type     (address)    | length | data addr  | contents               | rc
--------------------------------------------------------------------------------
 0 | Cannot access memory at address 0x4f444e49223a2264

(gdb) dump_bucket e
 bucket=Cannot access memory at address 0x4f444e49223a2264

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #9 from Yann Ylavic <yl...@gmail.com> ---
Discussion moved to the dev@ mailing-list, will report here afterward.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

Yann Ylavic <yl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #33184|0                           |1
        is obsolete|                            |

--- Comment #3 from Yann Ylavic <yl...@gmail.com> ---
Created attachment 33185
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33185&action=edit
Cleanup mod_proxy_wstunnel's backend brigades

Also cleans up EPIPE case.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #4 from Tabby <s....@gmail.com> ---
Hi and thanks for your quick feedback, Yann!

I applied your patch and did some quick load tests and all seems to work
smoothly.

Next, I'm trying it in the real scenario and let you know.
Finger crossed, for now.

ST

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #5 from Tabby <s....@gmail.com> ---
After 2 days, no more segfaults.
Stil heavily testing, but I'm pretty confident the issue has gone.

ST

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 58503] segfault in apr_brigade_cleanup() under high load with slow clients

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58503

--- Comment #6 from Ruediger Pluem <rp...@apache.org> ---
While it is nice that the patch works for you I suspect that it hides something
else. Can you please do a 

print e
print *e

in gdb in the thread that causes the segfault?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org