You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/08 07:10:35 UTC

[GitHub] [arrow] cyb70289 opened a new pull request #7373: ARROW-8974: [C++] Simplify TransferBitmap

cyb70289 opened a new pull request #7373:
URL: https://github.com/apache/arrow/pull/7373


   This patch removes "restore_trailing_bits" template parameter of
   TransferBitmap class. Trailing bits are now always not clobbered,
   which is no harm. It also refines trailing bits processing to keep
   the performance influence trivial. Besides, this patch replaces
   "invert_bits" boolean parameter with enum to allow explicit naming.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #7373: ARROW-8974: [C++] Simplify TransferBitmap

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #7373:
URL: https://github.com/apache/arrow/pull/7373#issuecomment-641675671


   s390 ci failure is not related, it says "no space left on device".
   appveyor ci failed with many zlib and lz4 runtime errors.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #7373: ARROW-8974: [C++] Simplify TransferBitmap

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #7373:
URL: https://github.com/apache/arrow/pull/7373


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on a change in pull request #7373: ARROW-8974: [C++] Simplify TransferBitmap

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #7373:
URL: https://github.com/apache/arrow/pull/7373#discussion_r437098352



##########
File path: cpp/src/arrow/util/bitmap_ops.cc
##########
@@ -298,52 +297,52 @@ void TransferBitmap(const uint8_t* data, int64_t offset, int64_t length,
     auto nwords = reader.words();
     while (nwords--) {
       auto word = reader.NextWord();
-      writer.PutNextWord(invert_bits ? ~word : word);
+      writer.PutNextWord(mode == TransferMode::Invert ? ~word : word);
     }
     auto nbytes = reader.trailing_bytes();
     while (nbytes--) {
       int valid_bits;
       auto byte = reader.NextTrailingByte(valid_bits);
-      writer.PutNextTrailingByte(invert_bits ? ~byte : byte, valid_bits);
+      writer.PutNextTrailingByte(mode == TransferMode::Invert ? ~byte : byte, valid_bits);
     }
-  } else {
-    // Shift dest by its byte offset
-    dest += dest_byte_offset;
+  } else if (length) {
+    int64_t num_bytes = BitUtil::BytesForBits(length);
+
+    // Shift by its byte offset
+    data += offset / 8;
+    dest += dest_offset / 8;
 
     // Take care of the trailing bits in the last byte
+    // E.g., if trailing_bits = 5, last byte should be
+    // - low  3 bits: new bits from last byte of data buffer
+    // - high 5 bits: old bits from last byte of dest buffer
     int64_t trailing_bits = num_bytes * 8 - length;
-    uint8_t trail = 0;
-    if (trailing_bits && restore_trailing_bits) {
-      trail = dest[num_bytes - 1];
-    }
+    uint8_t trail_mask = (1U << (8 - trailing_bits)) - 1;

Review comment:
       done

##########
File path: cpp/src/arrow/util/bitmap_ops.cc
##########
@@ -298,52 +297,52 @@ void TransferBitmap(const uint8_t* data, int64_t offset, int64_t length,
     auto nwords = reader.words();
     while (nwords--) {
       auto word = reader.NextWord();
-      writer.PutNextWord(invert_bits ? ~word : word);
+      writer.PutNextWord(mode == TransferMode::Invert ? ~word : word);
     }
     auto nbytes = reader.trailing_bytes();
     while (nbytes--) {
       int valid_bits;
       auto byte = reader.NextTrailingByte(valid_bits);
-      writer.PutNextTrailingByte(invert_bits ? ~byte : byte, valid_bits);
+      writer.PutNextTrailingByte(mode == TransferMode::Invert ? ~byte : byte, valid_bits);
     }
-  } else {
-    // Shift dest by its byte offset
-    dest += dest_byte_offset;
+  } else if (length) {
+    int64_t num_bytes = BitUtil::BytesForBits(length);
+
+    // Shift by its byte offset
+    data += offset / 8;
+    dest += dest_offset / 8;
 
     // Take care of the trailing bits in the last byte
+    // E.g., if trailing_bits = 5, last byte should be
+    // - low  3 bits: new bits from last byte of data buffer
+    // - high 5 bits: old bits from last byte of dest buffer
     int64_t trailing_bits = num_bytes * 8 - length;
-    uint8_t trail = 0;
-    if (trailing_bits && restore_trailing_bits) {
-      trail = dest[num_bytes - 1];
-    }
+    uint8_t kTrailingBitmask = (1U << (8 - trailing_bits)) - 1;

Review comment:
       Ah, it's a lookup table, cool. Will change.

##########
File path: cpp/src/arrow/util/bitmap_ops.cc
##########
@@ -298,52 +297,52 @@ void TransferBitmap(const uint8_t* data, int64_t offset, int64_t length,
     auto nwords = reader.words();
     while (nwords--) {
       auto word = reader.NextWord();
-      writer.PutNextWord(invert_bits ? ~word : word);
+      writer.PutNextWord(mode == TransferMode::Invert ? ~word : word);
     }
     auto nbytes = reader.trailing_bytes();
     while (nbytes--) {
       int valid_bits;
       auto byte = reader.NextTrailingByte(valid_bits);
-      writer.PutNextTrailingByte(invert_bits ? ~byte : byte, valid_bits);
+      writer.PutNextTrailingByte(mode == TransferMode::Invert ? ~byte : byte, valid_bits);
     }
-  } else {
-    // Shift dest by its byte offset
-    dest += dest_byte_offset;
+  } else if (length) {
+    int64_t num_bytes = BitUtil::BytesForBits(length);
+
+    // Shift by its byte offset
+    data += offset / 8;
+    dest += dest_offset / 8;
 
     // Take care of the trailing bits in the last byte
+    // E.g., if trailing_bits = 5, last byte should be
+    // - low  3 bits: new bits from last byte of data buffer
+    // - high 5 bits: old bits from last byte of dest buffer
     int64_t trailing_bits = num_bytes * 8 - length;
-    uint8_t trail = 0;
-    if (trailing_bits && restore_trailing_bits) {
-      trail = dest[num_bytes - 1];
-    }
+    uint8_t kTrailingBitmask = (1U << (8 - trailing_bits)) - 1;

Review comment:
       We need a table {255, 127, 63, 31, 15, 7, 3, 1}, looks there's no such lookup table available.
   Shall I add a new table or keep the code as is (revert the naming)?
   https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bit_util.h#L420

##########
File path: cpp/src/arrow/util/bitmap_ops.cc
##########
@@ -298,52 +297,52 @@ void TransferBitmap(const uint8_t* data, int64_t offset, int64_t length,
     auto nwords = reader.words();
     while (nwords--) {
       auto word = reader.NextWord();
-      writer.PutNextWord(invert_bits ? ~word : word);
+      writer.PutNextWord(mode == TransferMode::Invert ? ~word : word);
     }
     auto nbytes = reader.trailing_bytes();
     while (nbytes--) {
       int valid_bits;
       auto byte = reader.NextTrailingByte(valid_bits);
-      writer.PutNextTrailingByte(invert_bits ? ~byte : byte, valid_bits);
+      writer.PutNextTrailingByte(mode == TransferMode::Invert ? ~byte : byte, valid_bits);
     }
-  } else {
-    // Shift dest by its byte offset
-    dest += dest_byte_offset;
+  } else if (length) {
+    int64_t num_bytes = BitUtil::BytesForBits(length);
+
+    // Shift by its byte offset
+    data += offset / 8;
+    dest += dest_offset / 8;
 
     // Take care of the trailing bits in the last byte
+    // E.g., if trailing_bits = 5, last byte should be
+    // - low  3 bits: new bits from last byte of data buffer
+    // - high 5 bits: old bits from last byte of dest buffer
     int64_t trailing_bits = num_bytes * 8 - length;
-    uint8_t trail = 0;
-    if (trailing_bits && restore_trailing_bits) {
-      trail = dest[num_bytes - 1];
-    }
+    uint8_t kTrailingBitmask = (1U << (8 - trailing_bits)) - 1;

Review comment:
       We need a table {255, 127, 63, 31, 15, 7, 3, 1}, looks there's no such lookup table available.
   Shall I add a new table or keep the code as is (revert the renaming)?
   https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bit_util.h#L420




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7373: ARROW-8974: [C++] Simplify TransferBitmap

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7373:
URL: https://github.com/apache/arrow/pull/7373#issuecomment-640415175


   https://issues.apache.org/jira/browse/ARROW-8974


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7373: ARROW-8974: [C++] Simplify TransferBitmap

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7373:
URL: https://github.com/apache/arrow/pull/7373#discussion_r436733347



##########
File path: cpp/src/arrow/util/bitmap_ops.cc
##########
@@ -298,52 +297,52 @@ void TransferBitmap(const uint8_t* data, int64_t offset, int64_t length,
     auto nwords = reader.words();
     while (nwords--) {
       auto word = reader.NextWord();
-      writer.PutNextWord(invert_bits ? ~word : word);
+      writer.PutNextWord(mode == TransferMode::Invert ? ~word : word);
     }
     auto nbytes = reader.trailing_bytes();
     while (nbytes--) {
       int valid_bits;
       auto byte = reader.NextTrailingByte(valid_bits);
-      writer.PutNextTrailingByte(invert_bits ? ~byte : byte, valid_bits);
+      writer.PutNextTrailingByte(mode == TransferMode::Invert ? ~byte : byte, valid_bits);
     }
-  } else {
-    // Shift dest by its byte offset
-    dest += dest_byte_offset;
+  } else if (length) {
+    int64_t num_bytes = BitUtil::BytesForBits(length);
+
+    // Shift by its byte offset
+    data += offset / 8;
+    dest += dest_offset / 8;
 
     // Take care of the trailing bits in the last byte
+    // E.g., if trailing_bits = 5, last byte should be
+    // - low  3 bits: new bits from last byte of data buffer
+    // - high 5 bits: old bits from last byte of dest buffer
     int64_t trailing_bits = num_bytes * 8 - length;
-    uint8_t trail = 0;
-    if (trailing_bits && restore_trailing_bits) {
-      trail = dest[num_bytes - 1];
-    }
+    uint8_t trail_mask = (1U << (8 - trailing_bits)) - 1;

Review comment:
       Can probably use `kTrailingBitmask` or `kPrecedingWrappingBitmask`.

##########
File path: cpp/src/arrow/util/bitmap_ops.cc
##########
@@ -298,52 +297,52 @@ void TransferBitmap(const uint8_t* data, int64_t offset, int64_t length,
     auto nwords = reader.words();
     while (nwords--) {
       auto word = reader.NextWord();
-      writer.PutNextWord(invert_bits ? ~word : word);
+      writer.PutNextWord(mode == TransferMode::Invert ? ~word : word);
     }
     auto nbytes = reader.trailing_bytes();
     while (nbytes--) {
       int valid_bits;
       auto byte = reader.NextTrailingByte(valid_bits);
-      writer.PutNextTrailingByte(invert_bits ? ~byte : byte, valid_bits);
+      writer.PutNextTrailingByte(mode == TransferMode::Invert ? ~byte : byte, valid_bits);
     }
-  } else {
-    // Shift dest by its byte offset
-    dest += dest_byte_offset;
+  } else if (length) {
+    int64_t num_bytes = BitUtil::BytesForBits(length);
+
+    // Shift by its byte offset
+    data += offset / 8;
+    dest += dest_offset / 8;
 
     // Take care of the trailing bits in the last byte
+    // E.g., if trailing_bits = 5, last byte should be
+    // - low  3 bits: new bits from last byte of data buffer
+    // - high 5 bits: old bits from last byte of dest buffer
     int64_t trailing_bits = num_bytes * 8 - length;
-    uint8_t trail = 0;
-    if (trailing_bits && restore_trailing_bits) {
-      trail = dest[num_bytes - 1];
-    }
+    uint8_t kTrailingBitmask = (1U << (8 - trailing_bits)) - 1;

Review comment:
       I meant use the `kTrailingBitmask` constant array that's exposed in `bit_util.h` :-)

##########
File path: cpp/src/arrow/util/bitmap_ops.cc
##########
@@ -298,52 +297,52 @@ void TransferBitmap(const uint8_t* data, int64_t offset, int64_t length,
     auto nwords = reader.words();
     while (nwords--) {
       auto word = reader.NextWord();
-      writer.PutNextWord(invert_bits ? ~word : word);
+      writer.PutNextWord(mode == TransferMode::Invert ? ~word : word);
     }
     auto nbytes = reader.trailing_bytes();
     while (nbytes--) {
       int valid_bits;
       auto byte = reader.NextTrailingByte(valid_bits);
-      writer.PutNextTrailingByte(invert_bits ? ~byte : byte, valid_bits);
+      writer.PutNextTrailingByte(mode == TransferMode::Invert ? ~byte : byte, valid_bits);
     }
-  } else {
-    // Shift dest by its byte offset
-    dest += dest_byte_offset;
+  } else if (length) {
+    int64_t num_bytes = BitUtil::BytesForBits(length);
+
+    // Shift by its byte offset
+    data += offset / 8;
+    dest += dest_offset / 8;
 
     // Take care of the trailing bits in the last byte
+    // E.g., if trailing_bits = 5, last byte should be
+    // - low  3 bits: new bits from last byte of data buffer
+    // - high 5 bits: old bits from last byte of dest buffer
     int64_t trailing_bits = num_bytes * 8 - length;
-    uint8_t trail = 0;
-    if (trailing_bits && restore_trailing_bits) {
-      trail = dest[num_bytes - 1];
-    }
+    uint8_t kTrailingBitmask = (1U << (8 - trailing_bits)) - 1;

Review comment:
       Hmm, let's simply revert the renaming then :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on a change in pull request #7373: ARROW-8974: [C++] Simplify TransferBitmap

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #7373:
URL: https://github.com/apache/arrow/pull/7373#discussion_r437805418



##########
File path: cpp/src/arrow/util/bitmap_ops.cc
##########
@@ -298,52 +297,52 @@ void TransferBitmap(const uint8_t* data, int64_t offset, int64_t length,
     auto nwords = reader.words();
     while (nwords--) {
       auto word = reader.NextWord();
-      writer.PutNextWord(invert_bits ? ~word : word);
+      writer.PutNextWord(mode == TransferMode::Invert ? ~word : word);
     }
     auto nbytes = reader.trailing_bytes();
     while (nbytes--) {
       int valid_bits;
       auto byte = reader.NextTrailingByte(valid_bits);
-      writer.PutNextTrailingByte(invert_bits ? ~byte : byte, valid_bits);
+      writer.PutNextTrailingByte(mode == TransferMode::Invert ? ~byte : byte, valid_bits);
     }
-  } else {
-    // Shift dest by its byte offset
-    dest += dest_byte_offset;
+  } else if (length) {
+    int64_t num_bytes = BitUtil::BytesForBits(length);
+
+    // Shift by its byte offset
+    data += offset / 8;
+    dest += dest_offset / 8;
 
     // Take care of the trailing bits in the last byte
+    // E.g., if trailing_bits = 5, last byte should be
+    // - low  3 bits: new bits from last byte of data buffer
+    // - high 5 bits: old bits from last byte of dest buffer
     int64_t trailing_bits = num_bytes * 8 - length;
-    uint8_t trail = 0;
-    if (trailing_bits && restore_trailing_bits) {
-      trail = dest[num_bytes - 1];
-    }
+    uint8_t kTrailingBitmask = (1U << (8 - trailing_bits)) - 1;

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #7373: ARROW-8974: [C++] Simplify TransferBitmap

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #7373:
URL: https://github.com/apache/arrow/pull/7373#issuecomment-640996230


   RTools CI failure looks not related
   https://github.com/apache/arrow/pull/7373/checks?check_run_id=752286352#step:7:1077
   `error: failed retrieving file 'mingw-w64-i686-thrift-0.13.0-8000-any.pkg.tar.xz' from dl.bintray.com : Operation too slow. Less than 1 bytes/sec transferred the last 10 seconds`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7373: ARROW-8974: [C++] Simplify TransferBitmap

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7373:
URL: https://github.com/apache/arrow/pull/7373#issuecomment-641927363


   Thank you @cyb70289 !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7373: ARROW-8974: [C++] Simplify TransferBitmap

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7373:
URL: https://github.com/apache/arrow/pull/7373#issuecomment-641869288


   Rebased to (hopefully) fix AppVeyor.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7373: ARROW-8974: [C++] Simplify TransferBitmap

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7373:
URL: https://github.com/apache/arrow/pull/7373#issuecomment-641684149


   the Appveyor failures are ARROW-9085


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org