You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by db...@apache.org on 2023/07/14 10:25:25 UTC
[impala] branch master updated: IMPALA-12239: BitWidthZeroRepeated seems to be flaky
This is an automated email from the ASF dual-hosted git repository.
dbecker pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 14a5e9a20 IMPALA-12239: BitWidthZeroRepeated seems to be flaky
14a5e9a20 is described below
commit 14a5e9a20af59af7a03a9fb480c077d1d8a38187
Author: Daniel Becker <da...@cloudera.com>
AuthorDate: Tue Jun 27 12:05:03 2023 +0200
IMPALA-12239: BitWidthZeroRepeated seems to be flaky
RleTest.BitWidthZeroRepeated seems to be flaky in release builds. A
possible error message is this:
Value of: 0 Expected: val Which is: '\x9F' (159)
Stacktrace
.../Impala/be/src/util/rle-test.cc:410
Value of: 0
Expected: val
Which is: '\x9F' (159)
The problem seems to be around this 'memcpy()' call:
https://github.com/apache/impala/blob/3e9408480c5285ca925576b7486b35593407a32a/be/src/util/bit-stream-utils.inline.h#L237.
We're almost certainly running into undefined behaviour because
- the error only occurs in release mode, not in debug mode (not even in
ASAN or UBSAN mode)
- if we add something around the 'memcpy()', for example printing
'buffer_pos_' or 'v', the error doesn't occur
- the value with which the test fails, i.e. the value it reads instead
of the expected 0 is non-deterministic.
The failure occurs since https://gerrit.cloudera.org/#/c/20073/
(IMPALA-11961/IMPALA-12207: Add Redhat 9 / Ubuntu 22 support), which
upgraded gperftools.
We couldn't find the root cause of the bug but the most probable
possibilities are the following:
- there's a bug in Impala that leads to undefined behaviour and that
manifests in the test failure with the new gperftools version but not
with the old one
- there's a bug in the new gperftools version (2.10).
Downgrading gperftools is not an option because it is necessary for
Redhat 9 and Ubuntu 22. There is a workaround that causes the test to
pass - putting the 'memcpy()' in a conditional block:
if (LIKELY(buffer_pos_ != buffer_end_)) {
memcpy(v, buffer_pos_, num_bytes);
buffer_pos_ += num_bytes;
} else {
...
}
This most probably does not address the root cause, however; for more
see the Jira ticket.
Testing:
- manually verified that the test RleTest.BitWidthZeroRepeated passes
in a release build
Change-Id: I84645271101b4bd594dd25929924a3549c223244
Reviewed-on: http://gerrit.cloudera.org:8080/20187
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
be/src/util/bit-stream-utils.inline.h | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/be/src/util/bit-stream-utils.inline.h b/be/src/util/bit-stream-utils.inline.h
index ca3c28aef..3289d2b79 100644
--- a/be/src/util/bit-stream-utils.inline.h
+++ b/be/src/util/bit-stream-utils.inline.h
@@ -234,8 +234,25 @@ inline bool BatchedBitReader::GetBytes(int num_bytes, T* v) {
DCHECK_LE(num_bytes, sizeof(T));
if (UNLIKELY(buffer_pos_ + num_bytes > buffer_end_)) return false;
*v = 0; // Ensure unset bytes are initialized to zero.
- memcpy(v, buffer_pos_, num_bytes);
- buffer_pos_ += num_bytes;
+
+ // It is possible that we want to "read" when we're at the end of the buffer if for
+ // example we're unpacking values of bitwidth 0, but 'num_bytes' must then be 0. In this
+ // case 'buffer_pos_' == 'buffer_end_'.
+ //
+ // It is not completely clear whether a one-past-the-end pointer (like 'buffer_end_') is
+ // valid when used as an argument to memcpy(), see
+ // https://en.cppreference.com/w/cpp/string/byte/memcpy and
+ // https://stackoverflow.com/questions/29844298/is-it-legal-to-call-memcpy-with-zero-length-on-a-pointer-just-past-the-end-of-an.
+ // Placing the memcpy() call in a conditional block seems to work around the problem
+ // described in IMPALA-12239. It is probably not the solution to the root cause,
+ // however; see the Jira ticket for more details.
+ if (LIKELY(buffer_pos_ != buffer_end_)) {
+ memcpy(v, buffer_pos_, num_bytes);
+ buffer_pos_ += num_bytes;
+ } else {
+ DCHECK_EQ(num_bytes, 0);
+ }
+
return true;
}