You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2023/06/13 11:12:10 UTC

[Bug 66646] New: apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

            Bug ID: 66646
           Summary: apache http2 h2eos bucket corruption in
                    httpd-2.4.55/56/57 (with workaround)
           Product: Apache httpd-2
           Version: 2.4.57
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: mod_http2
          Assignee: bugs@httpd.apache.org
          Reporter: dzwillo@strato.de
  Target Milestone: ---

Created attachment 38585
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=38585&action=edit
check for h2eos bucket corruption

Hello,

starting with the big mod_http2 update in httpd-2.4.55 we see random
crashes in core_filter.c send_brigade_nonblocking/writev_nonblocking
because of corrupted buckets.

- the crashes do not occur when the current httpd-code (httpd-2.4.57)
  is used with mod_http2 from httpd-2.4.54.

- our servers are configured to run with 64 worker-threads per process,
  and the machines use redhat-linux. For each server which serves 
  about 20 million requests per day, we see between 1 and 5 crashes daily. 

- crashes occur always in the context of a HTTP2-request (mostly simple
  GET-requests). See example below.

- traces show that the crash happens always in cases where a h2eos
  bucket is destroyed and the following bucket from the brigade
  belongs to the same stream.

  see patch: httpd-trunk-mod_http2-check-for-h2eos-bucket-corruption

- the next/prev pointers of this following bucket sometimes point to
  illegal memory and indicate a use-after-free bug where the memory
  gets reused after the bucket-destroy-handler of the h2eos token
  was called.

- the patch checks for such cases in core_filter:send_brigade_nonblocking,
  and shows that the problem occurs far more often than the count of
  crashes we see.

- after a fresh apache restart it always takes some time (20-30 minutes)
  on our busy servers until the bug is triggered, so it might have todo
  with a special behavior of the apache-handlers in situtions where for
  example 'MaxConnectionsPerChild' is reached.

We are currently using a workaround to fix these crashes, which
was built to find out more about the problem:

- in h2_c1_io.c:pass_output the buckets are reordered, so that a 
  flush bucket and scratch heap bucket is inserted before the h2eos bucket
  when mod_http2 passes the output-brigade to the filters of other
  apache modules.

- since the h2eos<->flush swap fix in h2_c1_io.c seems not to be sufficient
  to catch all cases where a flush bucket is inserted, the remaining cases
  are swapped in core_filter.

  see patch: httpd-trunk-mod_http2-workaround-for-h2eos-bucket-corruption
  (should be applied on top of the other patch).

- with this patch applied the crashes are absent & the check doesn't trigger.

I think a real fix must more generally enforce the bucket-order in
mod_http2, so that h2eos is always the last bucket in a http2 stream.
Another possibility might be to allocate the flush/scratch-heap buckets
from a pool with a longer lifetime than the pool used for h2eos.

Example output when _checked_bucket_delete triggers:
---------------------------------------------------
[Tue Jun 06 12:49:55 2023] [error] [client ...] send_brigade_nonblocking_len0:
BAD BUCKET type H2EOS bucket 0x7f681c0009c8 before:(next == 0x7f681c000d88,
prev == 0x7f690c09df98) (next.next == 0x7f690c09df98, next.prev ==
0x7f681c0009c8) (first == 0x7f681c0009c8, last == 0x7f681c000d88) bb-sentinel
0x7f690c09df98 after:next type FLUSH 0x7f681c000d88 (next == 0x7f681c000ba8,
prev == 0x7f690c09df98) bb (first == 0x7f690c09df98, last == 0x7f690c09df98)

- before:
  head:  0x7f690c09df98 [first: h2eos, last: flush]
  h2eos: 0x7f681c0009c8 [next: flush, prev: head]
  flush: 0x7f681c000d88 [next: head, prev: h2eos]

- after bucket_delete(h2eos):
  head:  0x7f690c09df98 [first: head, last: head]
  flush: 0x7f681c000d88 [next: next, prev: head]
  next:  0x7f681c000ba8 ???

Example crash core:
------------------
Note: Line numbers here do not match repo because of included debug code.
      Such crashes occur after a check like above triggered in a 
      preceeding send_brigade_nonblocking loop iteration.

(gdb) bt
  #0  0x00007f1494013c68 in ?? ()
  #1  0x00000000004c9db7 in writev_nonblocking (s=s@entry=0x7f15d80b8340,
bb=bb@entry=0x7f15d80b8af0, bytes_to_write=bytes_to_write@entry=361,
nvec=nvec@entry=4, c=c@entry=0x7f15d80b8558, ctx=<optimized out>,
ctx=<optimized out>) at core_filters.c:850 --> if (n) { apr_bucket_split() ...
}
  #2  0x00000000004caf75 in send_brigade_nonblocking (c=0x7f15d80b8558,
ctx=0x7f1494071678, bb=0x7f15d80b8af0, s=0x7f15d80b8340) at core_filters.c:819
--> if (nvec > 0) { rv = writev_nonblocking .. }
  #3  ap_core_output_filter (f=0x7f1494071348, new_bb=0x7f15d80b8af0) at
core_filters.c:538
  #4  0x00000000005b58bc in ssl_io_filter_output (f=0x7f15d80b8aa8,
bb=0x7f14940717e0) at ssl_engine_io.c:2094
  #5  0x00000000005b234e in ssl_io_filter_coalesce (f=0x7f15d80b8a80,
bb=0x7f14940717e0) at ssl_engine_io.c:2037
  #6  0x00000000005ef937 in pass_output (flush=1, io=0x7f1494049af0) at
h2_c1_io.c:281
  #7  h2_c1_io_assure_flushed (io=io@entry=0x7f1494049af0) at h2_c1_io.c:340
  #8  0x00000000005dc9f0 in h2_session_shutdown_notice (session=0x7f1494049ab0)
at h2_session.c:735
  #9  h2_session_ev_mpm_stopping (arg=0, msg=0x0, session=0x7f1494049ab0) at
h2_session.c:1524
  #10 h2_session_dispatch_event (session=0x7f1494049ab0, ev=<optimized out>,
arg=0, msg=0x0) at h2_session.c:1726
  #11 0x00000000005de673 in h2_session_process (session=0x7f1494049ab0,
async=0) at h2_session.c:1798
  #12 0x00000000005cb0a9 in h2_c1_run (c=c@entry=0x7f15d80b8558) at h2_c1.c:130
  #13 0x00000000005cb4bd in h2_c1_hook_process_connection (c=0x7f15d80b8558) at
h2_c1.c:278
  #14 0x00000000004dbd10 in ap_run_process_connection
(c=c@entry=0x7f15d80b8558) at connection.c:42
  #15 0x00000000004dc23e in ap_process_connection (c=c@entry=0x7f15d80b8558,
csd=csd@entry=0x7f15d80b8340) at connection.c:217
  #16 0x00000000008212db in process_socket (bucket_alloc=0x7f1494013108,
my_thread_num=8, my_child_num=102, sock=0x7f15d80b8340, p=0x7f15d80b82b8,
thd=0x2e74528) at worker.c:485
  #17 worker_thread (thd=0x2e74528, dummy=<optimized out>) at worker.c:814
  #18 0x00007f15ff802ea5 in start_thread () from /lib64/libpthread.so.0
  #19 0x00007f15ff327b0d in clone () from /lib64/libc.so.6
(gdb) frame 1
(gdb) info locals
  apr__d = 0x7f15d80b8af8
  apr__b = 0x7f15d80b8af8
  bucket = 0x7f15d80b8af8
  n = 46
  rv = 0
  vec = <optimized out>
  bytes_written = 361
  i = 3
  offset = 3
(gdb) p *bb
  $1 = {p = 0x7f15d80b82b8, list = {next = 0x7f15d80b8af8, prev =
0x7f15d80b8af8}, bucket_alloc = 0x7f1494013108}
(gdb) p *bucket
  $2 = {link = {next = 0x7f15d80b8af8, prev = 0x7f15d80b8af8}, type =
0x7f1494013108, length = 139731795674416, start = 139731795675888, data =
0x92e640 <brigade_cleanup>, free = 0x987aa0 <apr_pool_cleanup_null>, list =
0x7f15d80b8b28}

Greetings,
Barnim

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 66646] apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

--- Comment #6 from Barnim Dzwillo <dz...@strato.de> ---
I added a version of the first 'check'-patch for the 2.4.57 tree now.
Hope this helps..

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 66646] apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

--- Comment #1 from Barnim Dzwillo <dz...@strato.de> ---
Created attachment 38586
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=38586&action=edit
workaround for h2eos bucket corruption

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 66646] apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

--- Comment #7 from Barnim Dzwillo <dz...@strato.de> ---
Created attachment 38588
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=38588&action=edit
combined check and workaround for h2eos bucket corruption (for 2.4.57)

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 66646] apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

--- Comment #3 from Stefan Eissing <st...@eissing.org> ---
The first patch, the check, does not apply to your 2.4.x sources. There is a
"delete_meta_bucket()" function referenced that does not exist.

Is this some other change by your not included or do you have a modified source
base?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 66646] apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

--- Comment #8 from Stefan Eissing <st...@eissing.org> ---
Thanks Barnim,

I was able to reproduce the problem and write a fix. Added as r1910386. Also,
this available in <https://github.com/icing/mod_h2/releases/tag/v2.0.19>.

Looking forward to hear how this runs on your systems.

Again, thanks for all the details provided which made this fix possible.

Stefan


Background:

What you were seeing were the results of a recursive call into the connection
output filters. The destruction of the H2EOS bucket for the last request on the
connection could trigger an immediate connection shutdown.

This shutdown, on certain conditions, attempted to write more data onto the
connection *while the output on that connection* was still being handled.

This led to the effect you described that the buckets after the H2EOS became
corrupted. This happened because the *next had been cleaned up in the recursive
call that was made.

The fix does two things: it removes the event handling that triggered the
second write attempt and it also adds a check to the output handling so that
recursive invocations are prevented.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 66646] apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

--- Comment #2 from Stefan Eissing <st...@eissing.org> ---
Thanks for the report with the detailed analysis and patched.

I'll try to analyse what the root cause here is.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 66646] apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

Christophe JAILLET <ch...@wanadoo.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #11 from Christophe JAILLET <ch...@wanadoo.fr> ---
This was backported in 2.4.x branch in r1910699 and is part of version 2.4.58.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 66646] apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

--- Comment #4 from Barnim Dzwillo <dz...@strato.de> ---
I rebased the patch to httpd-trunk from the svn repo:
https://svn.apache.org/repos/asf/httpd/httpd/trunk

In the 2.4.57 version just 'apr_bucket_delete(bucket)'
is called instead of 'delete_meta_bucket(bucket)'.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 66646] apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

--- Comment #10 from Barnim Dzwillo <dz...@strato.de> ---
Hello,

the patched httpd runs without problems for nearly one day now. It didn't
crash and didn't trigger the debug-messages, so the bugfix seems to work.
Thank you again.

It would be nice if this could be included in the next httpd release.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 66646] apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

--- Comment #5 from Barnim Dzwillo <dz...@strato.de> ---
Created attachment 38587
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=38587&action=edit
check for h2eos bucket corruption (for 2.4.57 instead of trunk)

same as the first patch - applies to svn 2.4.57 instead of trunk tree

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 66646] apache http2 h2eos bucket corruption in httpd-2.4.55/56/57 (with workaround)

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66646

--- Comment #9 from Barnim Dzwillo <dz...@strato.de> ---
Thank you for the fast feedback Stefan.

I applied your patch, disabled my workaround and kept my check-code active.

This runs for 30 minutes now, and nether my check nor the 'Denied to prevent..'
check from your patch triggered. I assume that the 'goto cleanup' after
APR_BIGRADE_EMPTY-check is the real fix, and it is a good sign that the
checks are quiet.

I will let you know tomorrow if this keeps working for longer time period.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org