You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by db...@apache.org on 2023/07/14 10:25:25 UTC

[impala] branch master updated: IMPALA-12239: BitWidthZeroRepeated seems to be flaky

This is an automated email from the ASF dual-hosted git repository.

dbecker pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 14a5e9a20 IMPALA-12239: BitWidthZeroRepeated seems to be flaky
14a5e9a20 is described below

commit 14a5e9a20af59af7a03a9fb480c077d1d8a38187
Author: Daniel Becker <da...@cloudera.com>
AuthorDate: Tue Jun 27 12:05:03 2023 +0200

    IMPALA-12239: BitWidthZeroRepeated seems to be flaky
    
    RleTest.BitWidthZeroRepeated seems to be flaky in release builds. A
    possible error message is this:
    
      Value of: 0 Expected: val Which is: '\x9F' (159)
      Stacktrace
    
      .../Impala/be/src/util/rle-test.cc:410
      Value of: 0
      Expected: val
      Which is: '\x9F' (159)
    
    The problem seems to be around this 'memcpy()' call:
    https://github.com/apache/impala/blob/3e9408480c5285ca925576b7486b35593407a32a/be/src/util/bit-stream-utils.inline.h#L237.
    
    We're almost certainly running into undefined behaviour because
     - the error only occurs in release mode, not in debug mode (not even in
       ASAN or UBSAN mode)
     - if we add something around the 'memcpy()', for example printing
       'buffer_pos_' or 'v', the error doesn't occur
     - the value with which the test fails, i.e. the value it reads instead
       of the expected 0 is non-deterministic.
    
    The failure occurs since https://gerrit.cloudera.org/#/c/20073/
    (IMPALA-11961/IMPALA-12207: Add Redhat 9 / Ubuntu 22 support), which
    upgraded gperftools.
    
    We couldn't find the root cause of the bug but the most probable
    possibilities are the following:
     - there's a bug in Impala that leads to undefined behaviour and that
       manifests in the test failure with the new gperftools version but not
       with the old one
     - there's a bug in the new gperftools version (2.10).
    
    Downgrading gperftools is not an option because it is necessary for
    Redhat 9 and Ubuntu 22. There is a workaround that causes the test to
    pass - putting the 'memcpy()' in a conditional block:
    
      if (LIKELY(buffer_pos_ != buffer_end_)) {
        memcpy(v, buffer_pos_, num_bytes);
        buffer_pos_ += num_bytes;
      } else {
        ...
      }
    
    This most probably does not address the root cause, however; for more
    see the Jira ticket.
    
    Testing:
      - manually verified that the test RleTest.BitWidthZeroRepeated passes
        in a release build
    
    Change-Id: I84645271101b4bd594dd25929924a3549c223244
    Reviewed-on: http://gerrit.cloudera.org:8080/20187
    Reviewed-by: Joe McDonnell <jo...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/util/bit-stream-utils.inline.h | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/be/src/util/bit-stream-utils.inline.h b/be/src/util/bit-stream-utils.inline.h
index ca3c28aef..3289d2b79 100644
--- a/be/src/util/bit-stream-utils.inline.h
+++ b/be/src/util/bit-stream-utils.inline.h
@@ -234,8 +234,25 @@ inline bool BatchedBitReader::GetBytes(int num_bytes, T* v) {
   DCHECK_LE(num_bytes, sizeof(T));
   if (UNLIKELY(buffer_pos_ + num_bytes > buffer_end_)) return false;
   *v = 0; // Ensure unset bytes are initialized to zero.
-  memcpy(v, buffer_pos_, num_bytes);
-  buffer_pos_ += num_bytes;
+
+  // It is possible that we want to "read" when we're at the end of the buffer if for
+  // example we're unpacking values of bitwidth 0, but 'num_bytes' must then be 0. In this
+  // case 'buffer_pos_' == 'buffer_end_'.
+  //
+  // It is not completely clear whether a one-past-the-end pointer (like 'buffer_end_') is
+  // valid when used as an argument to memcpy(), see
+  // https://en.cppreference.com/w/cpp/string/byte/memcpy and
+  // https://stackoverflow.com/questions/29844298/is-it-legal-to-call-memcpy-with-zero-length-on-a-pointer-just-past-the-end-of-an.
+  // Placing the memcpy() call in a conditional block seems to work around the problem
+  // described in IMPALA-12239. It is probably not the solution to the root cause,
+  // however; see the Jira ticket for more details.
+  if (LIKELY(buffer_pos_ != buffer_end_)) {
+    memcpy(v, buffer_pos_, num_bytes);
+    buffer_pos_ += num_bytes;
+  } else {
+    DCHECK_EQ(num_bytes, 0);
+  }
+
   return true;
 }