You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/27 10:12:47 UTC

[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #716: Optimize array::transform::utils::set_bits

jhorstmann commented on a change in pull request #716:
URL: https://github.com/apache/arrow-rs/pull/716#discussion_r697322266



##########
File path: arrow/src/array/transform/utils.rs
##########
@@ -35,15 +42,37 @@ pub(super) fn set_bits(
     offset_read: usize,
     len: usize,
 ) -> usize {
-    let mut count = 0;
-    (0..len).for_each(|i| {
-        if bit_util::get_bit(data, offset_read + i) {
-            bit_util::set_bit(write_data, offset_write + i);
-        } else {
-            count += 1;
-        }
+    let mut null_count = 0;
+
+    let mut bits_to_align = offset_write % 8;
+    if bits_to_align > 0 {
+        bits_to_align = std::cmp::min(len, 8 - bits_to_align);
+    }
+    let mut byte_index = ceil(offset_write + bits_to_align, 8);
+
+    // Set full bytes provided by bit chunk iterator
+    let chunks = BitChunks::new(data, offset_read + bits_to_align, len - bits_to_align);
+    chunks.iter().for_each(|chunk| {
+        null_count += chunk.count_zeros();
+        chunk.to_ne_bytes().iter().for_each(|b| {

Review comment:
       I think this needs to use `to_le_bytes`, for example see the comments in `ops.rs`, `bitwise_bin_op_helper` (which has some typos):
   
       // we are counting bits starting from the least significant bit, so to_le_bytes should be correct




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org