You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/10 08:55:47 UTC

[GitHub] [arrow] kiszk opened a new pull request #7145: ARROW-8756: [C++] Fix Bitmap Words tests' failures on big-endian platforms

kiszk opened a new pull request #7145:
URL: https://github.com/apache/arrow/pull/7145


   This PR adds support of multiple-word operation on big-endian platforms. There are optimized code to concat multiple words into one word with bit shift operations. The current code assumes a little-endian platform.   
   This code support bit-endian by adding conversions between little-endian and big-endian. This is because the shift operations assume the little-endian layout. It is easy to implement by adding conversions.
   
   With #7136 and this PR, the following failures in arrow-utility-test will be fixed.
   ```
   [  FAILED  ] Bitmap.VisitWordsAnd
   [  FAILED  ] Bitmap.ShiftingWordsOptimization
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7145: ARROW-8756: [C++] Fix Bitmap Words tests' failures on big-endian platforms

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7145:
URL: https://github.com/apache/arrow/pull/7145#issuecomment-626295954


   https://issues.apache.org/jira/browse/ARROW-8756


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7145: ARROW-8756: [C++] Fix Bitmap Words tests' failures on big-endian platforms

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7145:
URL: https://github.com/apache/arrow/pull/7145#discussion_r422986548



##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -1120,22 +1120,28 @@ TEST(BitUtil, BitsetStack) {
 // test the basic assumption of word level Bitmap::Visit
 TEST(Bitmap, ShiftingWordsOptimization) {
   // single word
+  // this test assumes the little endian memory layout in *bytes
+  // Therefore, convert data layout of word to little-endian on big-endian platform
   {
     uint64_t word;
     auto bytes = reinterpret_cast<uint8_t*>(&word);
     constexpr size_t kBitWidth = sizeof(word) * 8;
 
     for (int seed = 0; seed < 64; ++seed) {
       random_bytes(sizeof(word), seed, bytes);
+      uint64_t original_word = word;

Review comment:
       Wouldn't it be simpler to write `uint64_t native_word = FromLittleEndian(word)`?

##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -1144,27 +1150,38 @@ TEST(Bitmap, ShiftingWordsOptimization) {
   }
 
   // two words
+  // this test assumes the little endian memory layout in *bytes
+  // Therefore, convert data layout of words to little-endian on big-endian platform
   {
     uint64_t words[2];
     auto bytes = reinterpret_cast<uint8_t*>(words);
     constexpr size_t kBitWidth = sizeof(words[0]) * 8;
 
     for (int seed = 0; seed < 64; ++seed) {
       random_bytes(sizeof(words), seed, bytes);
+      uint64_t original_words0 = words[0];

Review comment:
       Same here.

##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -990,8 +990,16 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
           if (offsets[i] == 0) {
             visited_words[i] = words[i][word_i];
           } else {
+#if ARROW_LITTLE_ENDIAN
             visited_words[i] = words[i][word_i] >> offsets[i];
             visited_words[i] |= words[i][word_i + 1] << (kBitWidth - offsets[i]);
+#else
+            auto words0 = BitUtil::ByteSwap(words[i][word_i]);
+            auto words1 = BitUtil::ByteSwap(words[i][word_i + 1]);
+            auto visited_word =
+                (words0 >> offsets[i]) | (words1 << (kBitWidth - offsets[i]));
+            visited_words[i] = BitUtil::ByteSwap(visited_word);
+#endif

Review comment:
       Instead of separate code paths, can we just use `FromLittleEndian` and `ToLittleEndian`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org