You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/10 08:55:47 UTC
[GitHub] [arrow] kiszk opened a new pull request #7145: ARROW-8756: [C++] Fix Bitmap Words tests' failures on big-endian platforms
kiszk opened a new pull request #7145:
URL: https://github.com/apache/arrow/pull/7145
This PR adds support of multiple-word operation on big-endian platforms. There are optimized code to concat multiple words into one word with bit shift operations. The current code assumes a little-endian platform.
This code support bit-endian by adding conversions between little-endian and big-endian. This is because the shift operations assume the little-endian layout. It is easy to implement by adding conversions.
With #7136 and this PR, the following failures in arrow-utility-test will be fixed.
```
[ FAILED ] Bitmap.VisitWordsAnd
[ FAILED ] Bitmap.ShiftingWordsOptimization
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7145: ARROW-8756: [C++] Fix Bitmap Words tests' failures on big-endian platforms
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7145:
URL: https://github.com/apache/arrow/pull/7145#issuecomment-626295954
https://issues.apache.org/jira/browse/ARROW-8756
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7145: ARROW-8756: [C++] Fix Bitmap Words tests' failures on big-endian platforms
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7145:
URL: https://github.com/apache/arrow/pull/7145#discussion_r422986548
##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -1120,22 +1120,28 @@ TEST(BitUtil, BitsetStack) {
// test the basic assumption of word level Bitmap::Visit
TEST(Bitmap, ShiftingWordsOptimization) {
// single word
+ // this test assumes the little endian memory layout in *bytes
+ // Therefore, convert data layout of word to little-endian on big-endian platform
{
uint64_t word;
auto bytes = reinterpret_cast<uint8_t*>(&word);
constexpr size_t kBitWidth = sizeof(word) * 8;
for (int seed = 0; seed < 64; ++seed) {
random_bytes(sizeof(word), seed, bytes);
+ uint64_t original_word = word;
Review comment:
Wouldn't it be simpler to write `uint64_t native_word = FromLittleEndian(word)`?
##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -1144,27 +1150,38 @@ TEST(Bitmap, ShiftingWordsOptimization) {
}
// two words
+ // this test assumes the little endian memory layout in *bytes
+ // Therefore, convert data layout of words to little-endian on big-endian platform
{
uint64_t words[2];
auto bytes = reinterpret_cast<uint8_t*>(words);
constexpr size_t kBitWidth = sizeof(words[0]) * 8;
for (int seed = 0; seed < 64; ++seed) {
random_bytes(sizeof(words), seed, bytes);
+ uint64_t original_words0 = words[0];
Review comment:
Same here.
##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -990,8 +990,16 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
if (offsets[i] == 0) {
visited_words[i] = words[i][word_i];
} else {
+#if ARROW_LITTLE_ENDIAN
visited_words[i] = words[i][word_i] >> offsets[i];
visited_words[i] |= words[i][word_i + 1] << (kBitWidth - offsets[i]);
+#else
+ auto words0 = BitUtil::ByteSwap(words[i][word_i]);
+ auto words1 = BitUtil::ByteSwap(words[i][word_i + 1]);
+ auto visited_word =
+ (words0 >> offsets[i]) | (words1 << (kBitWidth - offsets[i]));
+ visited_words[i] = BitUtil::ByteSwap(visited_word);
+#endif
Review comment:
Instead of separate code paths, can we just use `FromLittleEndian` and `ToLittleEndian`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org