You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/24 18:30:27 UTC

[GitHub] [arrow] jhorstmann opened a new pull request #8262: ARROW-10040 Iterate over and combine boolean buffers with arbitrary offsets

jhorstmann opened a new pull request #8262:
URL: https://github.com/apache/arrow/pull/8262


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#discussion_r501273876



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -371,118 +388,165 @@ where
 
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))
         .for_each(|(res, (left, right))| {
-            *res = op(*left, *right);
+            *res = op(left, right);
         });
 
+    let remainder_bytes = ceil(left_chunks.remainder_len(), 8);
+    let rem = op(left_chunks.remainder_bits(), right_chunks.remainder_bits());
+    let rem = &rem.to_le_bytes()[0..remainder_bytes];
+    result
+        .write_all(rem)
+        .expect("not enough capacity in buffer");
+
     result.freeze()
 }
 
 fn bitwise_unary_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
-    len: usize,
+    offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8) -> u8,
+    F: Fn(u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(left.data()[left_offset..].iter())
+    let left_chunks = left.bit_chunks(offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter())
         .for_each(|(res, left)| {
-            *res = op(*left);
+            *res = op(left);
         });
 
+    let remainder_bytes = ceil(left_chunks.remainder_len(), 8);
+    let rem = op(left_chunks.remainder_bits());
+    let rem = &rem.to_le_bytes()[0..remainder_bytes];

Review comment:
       @nevi-me Talking about endianness, I'm only about 85% sure that `to_le_bytes` is correct here instead of `to_ne_bytes`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#discussion_r501772813



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -369,120 +394,171 @@ where
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to two inputs and return the result as a Buffer.
+/// The inputs are treated as bitmaps, meaning that offsets and length are specified in number of bits.
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))

Review comment:
       Out of curiosity, which toolkit or procedure do you guys use for looking at the assembly? I am curious (amazed) how you end up on that level of detail!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#discussion_r501681537



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -254,6 +256,21 @@ impl Buffer {
         )
     }
 
+    /// Returns a slice of this buffer starting at a certain bit offset.
+    /// If the offset is byte-aligned the returned buffer is a shallow clone,
+    /// otherwise a new buffer is allocated and filled with a copy of the bits in the range.
+    pub fn bit_slice(&self, offset: usize, len: usize) -> Self {
+        if offset % 8 == 0 && len % 8 == 0 {
+            return self.slice(offset / 8);
+        }
+
+        bitwise_unary_op_helper(&self, offset, len, |a| a)
+    }
+
+    pub fn bit_chunks(&self, offset: usize, len: usize) -> BitChunks {

Review comment:
       May you please add a docstring here

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -369,120 +394,171 @@ where
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to two inputs and return the result as a Buffer.
+/// The inputs are treated as bitmaps, meaning that offsets and length are specified in number of bits.
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))

Review comment:
       Amirite that here we're?:
   - indexing into left and right at size u64 chunks
   - creating a `result_chunks` of same output len in bytes (so len / 8)
   - iterating through the l&r chunks, and writing the result of `op(*)` to the `result_chunks`, which becomes aligned
   - doing the same with `remainder_bytes`.
   
   

##########
File path: rust/arrow/src/util/bit_chunk_iterator.rs
##########
@@ -0,0 +1,223 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+use crate::buffer::Buffer;
+use crate::util::bit_util::ceil;
+use std::fmt::Debug;
+
+#[derive(Debug)]
+pub struct BitChunks<'a> {
+    buffer: &'a Buffer,
+    raw_data: *const u8,
+    offset: usize,
+    chunk_len: usize,
+    remainder_len: usize,
+}
+
+impl<'a> BitChunks<'a> {
+    pub fn new(buffer: &'a Buffer, offset: usize, len: usize) -> Self {
+        assert!(ceil(offset + len, 8) <= buffer.len() * 8);

Review comment:
       Might be helpful to add a failure message to this assertion, also helps with testing sometimes

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -369,120 +394,171 @@ where
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to two inputs and return the result as a Buffer.
+/// The inputs are treated as bitmaps, meaning that offsets and length are specified in number of bits.
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))
         .for_each(|(res, (left, right))| {
-            *res = op(*left, *right);
+            *res = op(left, right);
         });
 
+    let remainder_bytes = ceil(left_chunks.remainder_len(), 8);
+    let rem = op(left_chunks.remainder_bits(), right_chunks.remainder_bits());
+    let rem = &rem.to_le_bytes()[0..remainder_bytes];
+    result
+        .write_all(rem)
+        .expect("not enough capacity in buffer");
+
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to one input and return the result as a Buffer.
+/// The input is treated as a bitmap, meaning that offset and length are specified in number of bits.
 fn bitwise_unary_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
-    len: usize,
+    offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8) -> u8,
+    F: Fn(u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(left.data()[left_offset..].iter())
+    let left_chunks = left.bit_chunks(offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter())
         .for_each(|(res, left)| {
-            *res = op(*left);
+            *res = op(left);
         });
 
+    let remainder_bytes = ceil(left_chunks.remainder_len(), 8);
+    let rem = op(left_chunks.remainder_bits());
+    let rem = &rem.to_le_bytes()[0..remainder_bytes];
+    result
+        .write_all(rem)
+        .expect("not enough capacity in buffer");
+
     result.freeze()
 }
 
 pub(super) fn buffer_bin_and(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
 ) -> Buffer {
-    // SIMD implementation if available
+    // SIMD implementation if available and byte-aligned

Review comment:
       Seems like a reasonable tradeoff, that if an array's slice isn't byte aligned, it might end up being processed on the slower path




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#discussion_r501769149



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -369,120 +394,171 @@ where
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to two inputs and return the result as a Buffer.
+/// The inputs are treated as bitmaps, meaning that offsets and length are specified in number of bits.
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))

Review comment:
       Yes, that sounds correct. The result buffer is newly created and properly aligned for the largest primitive types.
   
   The `with_bitset` call sets the len of the buffer to multiple of 8bytes / 64bits, as otherwise the `typed_data_mut` function would complain. The capacity of the buffer is however large enough to also contain the remainder bytes.
   
   There are some comparison opcodes left in the assembly, I couldn't find a way to convince the compiler that all iterators have the same length. Still it seems to be not much slower than the simd version.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#discussion_r501681537



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -254,6 +256,21 @@ impl Buffer {
         )
     }
 
+    /// Returns a slice of this buffer starting at a certain bit offset.
+    /// If the offset is byte-aligned the returned buffer is a shallow clone,
+    /// otherwise a new buffer is allocated and filled with a copy of the bits in the range.
+    pub fn bit_slice(&self, offset: usize, len: usize) -> Self {
+        if offset % 8 == 0 && len % 8 == 0 {
+            return self.slice(offset / 8);
+        }
+
+        bitwise_unary_op_helper(&self, offset, len, |a| a)
+    }
+
+    pub fn bit_chunks(&self, offset: usize, len: usize) -> BitChunks {

Review comment:
       May you please add a docstring here

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -369,120 +394,171 @@ where
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to two inputs and return the result as a Buffer.
+/// The inputs are treated as bitmaps, meaning that offsets and length are specified in number of bits.
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))

Review comment:
       Amirite that here we're?:
   - indexing into left and right at size u64 chunks
   - creating a `result_chunks` of same output len in bytes (so len / 8)
   - iterating through the l&r chunks, and writing the result of `op(*)` to the `result_chunks`, which becomes aligned
   - doing the same with `remainder_bytes`.
   
   

##########
File path: rust/arrow/src/util/bit_chunk_iterator.rs
##########
@@ -0,0 +1,223 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+use crate::buffer::Buffer;
+use crate::util::bit_util::ceil;
+use std::fmt::Debug;
+
+#[derive(Debug)]
+pub struct BitChunks<'a> {
+    buffer: &'a Buffer,
+    raw_data: *const u8,
+    offset: usize,
+    chunk_len: usize,
+    remainder_len: usize,
+}
+
+impl<'a> BitChunks<'a> {
+    pub fn new(buffer: &'a Buffer, offset: usize, len: usize) -> Self {
+        assert!(ceil(offset + len, 8) <= buffer.len() * 8);

Review comment:
       Might be helpful to add a failure message to this assertion, also helps with testing sometimes

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -369,120 +394,171 @@ where
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to two inputs and return the result as a Buffer.
+/// The inputs are treated as bitmaps, meaning that offsets and length are specified in number of bits.
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))
         .for_each(|(res, (left, right))| {
-            *res = op(*left, *right);
+            *res = op(left, right);
         });
 
+    let remainder_bytes = ceil(left_chunks.remainder_len(), 8);
+    let rem = op(left_chunks.remainder_bits(), right_chunks.remainder_bits());
+    let rem = &rem.to_le_bytes()[0..remainder_bytes];
+    result
+        .write_all(rem)
+        .expect("not enough capacity in buffer");
+
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to one input and return the result as a Buffer.
+/// The input is treated as a bitmap, meaning that offset and length are specified in number of bits.
 fn bitwise_unary_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
-    len: usize,
+    offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8) -> u8,
+    F: Fn(u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(left.data()[left_offset..].iter())
+    let left_chunks = left.bit_chunks(offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter())
         .for_each(|(res, left)| {
-            *res = op(*left);
+            *res = op(left);
         });
 
+    let remainder_bytes = ceil(left_chunks.remainder_len(), 8);
+    let rem = op(left_chunks.remainder_bits());
+    let rem = &rem.to_le_bytes()[0..remainder_bytes];
+    result
+        .write_all(rem)
+        .expect("not enough capacity in buffer");
+
     result.freeze()
 }
 
 pub(super) fn buffer_bin_and(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
 ) -> Buffer {
-    // SIMD implementation if available
+    // SIMD implementation if available and byte-aligned

Review comment:
       Seems like a reasonable tradeoff, that if an array's slice isn't byte aligned, it might end up being processed on the slower path




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8262:
URL: https://github.com/apache/arrow/pull/8262


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#discussion_r501813151



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -369,120 +394,171 @@ where
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to two inputs and return the result as a Buffer.
+/// The inputs are treated as bitmaps, meaning that offsets and length are specified in number of bits.
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))

Review comment:
       I don't really have fancy tooling for that. For isolated functions, the [compiler explorer][1] is very useful. When looking at benchmarks with bigger dependencies I use the `objdump` command which I think comes with linux by default. So I run a benchmark, notice the executable it is running and then do something like
   
   ```
   objdump -M intel -S target/release/deps/buffer_bit_ops-a5cda2cafee5a9f9 | less
   ```
   
   And then try to find my method in there. Sometimes it helps to give the method a unique name (`sum` for a kernel was hard to find) and to mark the method I'm interested in as `#inline(never)`. Reading and understanding the assembly then is the tricky part. I usually expect a certain instruction sequence in the innermost loop that I'm looking for, here for example a combination of left shift / right shift (from the bit slice iterator) and binary and.
   
   Vector instructions are usually a bit easier to spot since they are not that common. A good heuristic is also that simpler + shorter instruction sequences = faster.
   
   I haven't tried it yet, but a reverse engineering tool like [Ghidra][2] could be useful to more easily analyze loops.
   
    [1]: https://rust.godbolt.org/
    [2]: https://ghidra-sre.org/




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#discussion_r501255235



##########
File path: rust/arrow/src/util/bit_chunk_iterator.rs
##########
@@ -0,0 +1,223 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+use crate::buffer::Buffer;
+use crate::util::bit_util::ceil;
+use std::fmt::Debug;
+
+#[derive(Debug)]
+pub struct BitChunks<'a> {
+    buffer: &'a Buffer,
+    raw_data: *const u8,
+    offset: usize,
+    chunk_len: usize,
+    remainder_len: usize,
+}
+
+impl<'a> BitChunks<'a> {
+    pub fn new(buffer: &'a Buffer, offset: usize, len: usize) -> Self {
+        assert!(ceil(offset + len, 8) <= buffer.len() * 8);
+
+        let byte_offset = offset / 8;
+        let offset = offset % 8;
+
+        let raw_data = unsafe { buffer.raw_data().add(byte_offset) };
+
+        let chunk_bits = 64;
+
+        let chunk_len = len / chunk_bits;
+        let remainder_len = len & (chunk_bits - 1);
+
+        BitChunks::<'a> {
+            buffer: &buffer,
+            raw_data,
+            offset,
+            chunk_len,
+            remainder_len,
+        }
+    }
+}
+
+#[derive(Debug)]
+pub struct BitChunkIterator<'a> {
+    buffer: &'a Buffer,
+    raw_data: *const u8,
+    offset: usize,
+    chunk_len: usize,
+    index: usize,
+}
+
+impl<'a> BitChunks<'a> {
+    #[inline]
+    pub fn remainder_len(&self) -> usize {
+        self.remainder_len
+    }
+
+    #[inline]
+    pub fn remainder_bits(&self) -> u64 {
+        let bit_len = self.remainder_len;
+        if bit_len == 0 {
+            0
+        } else {
+            let byte_len = ceil(bit_len, 8);
+
+            let mut bits = 0;
+            for i in 0..byte_len {
+                let byte = unsafe {
+                    std::ptr::read(
+                        self.raw_data
+                            .add(self.chunk_len * std::mem::size_of::<u64>() + i),
+                    )
+                };
+                bits |= (byte as u64) << (i * 8);
+            }
+
+            let offset = self.offset as u64;
+
+            (bits >> offset) & ((1 << bit_len) - 1)
+        }
+    }
+
+    #[inline]
+    pub fn iter(&self) -> BitChunkIterator<'a> {
+        BitChunkIterator::<'a> {
+            buffer: self.buffer,
+            raw_data: self.raw_data,
+            offset: self.offset,
+            chunk_len: self.chunk_len,
+            index: 0,
+        }
+    }
+}
+
+impl<'a> IntoIterator for BitChunks<'a> {
+    type Item = u64;
+    type IntoIter = BitChunkIterator<'a>;
+
+    fn into_iter(self) -> Self::IntoIter {
+        self.iter()
+    }
+}
+
+impl Iterator for BitChunkIterator<'_> {
+    type Item = u64;
+
+    #[inline]
+    fn next(&mut self) -> Option<u64> {
+        if self.index >= self.chunk_len {
+            return None;
+        }
+
+        // cast to *const u64 should be fine since we are using read_unaligned
+        #[allow(clippy::cast_ptr_alignment)]
+        let current = unsafe {
+            std::ptr::read_unaligned((self.raw_data as *const u64).add(self.index))
+        };
+
+        let combined = if self.offset == 0 {
+            current
+        } else {
+            // cast to *const u64 should be fine since we are using read_unaligned
+            #[allow(clippy::cast_ptr_alignment)]
+            let next = unsafe {
+                std::ptr::read_unaligned(
+                    (self.raw_data as *const u64).add(self.index + 1),
+                )
+            };
+            current >> self.offset
+                | (next & ((1 << self.offset) - 1)) << (64 - self.offset)
+        };
+
+        self.index += 1;
+
+        Some(combined)
+    }
+
+    #[inline]
+    fn size_hint(&self) -> (usize, Option<usize>) {
+        (
+            self.chunk_len - self.index,
+            Some(self.chunk_len - self.index),
+        )
+    }
+}
+
+impl ExactSizeIterator for BitChunkIterator<'_> {
+    #[inline]
+    fn len(&self) -> usize {
+        self.chunk_len - self.index
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use crate::buffer::Buffer;
+
+    #[test]
+    fn test_iter_aligned() {

Review comment:
       I don't understand this test :( Why are the values getting reversed? Does this have to do with endianness?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8262: ARROW-10040 Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#issuecomment-698514562


   https://issues.apache.org/jira/browse/ARROW-10040


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#discussion_r501772813



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -369,120 +394,171 @@ where
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to two inputs and return the result as a Buffer.
+/// The inputs are treated as bitmaps, meaning that offsets and length are specified in number of bits.
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))

Review comment:
       Out of curiosity, which toolkit or procedure do you guys use for looking at the assembly? I am curious (amazed) how you end up on that level of detail!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#discussion_r501769149



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -369,120 +394,171 @@ where
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to two inputs and return the result as a Buffer.
+/// The inputs are treated as bitmaps, meaning that offsets and length are specified in number of bits.
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))

Review comment:
       Yes, that sounds correct. The result buffer is newly created and properly aligned for the largest primitive types.
   
   The `with_bitset` call sets the len of the buffer to multiple of 8bytes / 64bits, as otherwise the `typed_data_mut` function would complain. The capacity of the buffer is however large enough to also contain the remainder bytes.
   
   There are some comparison opcodes left in the assembly, I couldn't find a way to convince the compiler that all iterators have the same length. Still it seems to be not much slower than the simd version.

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -369,120 +394,171 @@ where
     result.freeze()
 }
 
+/// Apply a bitwise operation `op` to two inputs and return the result as a Buffer.
+/// The inputs are treated as bitmaps, meaning that offsets and length are specified in number of bits.
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))

Review comment:
       I don't really have fancy tooling for that. For isolated functions, the [compiler explorer][1] is very useful. When looking at benchmarks with bigger dependencies I use the `objdump` command which I think comes with linux by default. So I run a benchmark, notice the executable it is running and then do something like
   
   ```
   objdump -M intel -S target/release/deps/buffer_bit_ops-a5cda2cafee5a9f9 | less
   ```
   
   And then try to find my method in there. Sometimes it helps to give the method a unique name (`sum` for a kernel was hard to find) and to mark the method I'm interested in as `#inline(never)`. Reading and understanding the assembly then is the tricky part. I usually expect a certain instruction sequence in the innermost loop that I'm looking for, here for example a combination of left shift / right shift (from the bit slice iterator) and binary and.
   
   Vector instructions are usually a bit easier to spot since they are not that common. A good heuristic is also that simpler + shorter instruction sequences = faster.
   
   I haven't tried it yet, but a reverse engineering tool like [Ghidra][2] could be useful to more easily analyze loops.
   
    [1]: https://rust.godbolt.org/
    [2]: https://ghidra-sre.org/




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8262:
URL: https://github.com/apache/arrow/pull/8262


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8262: ARROW-10040 Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#issuecomment-698514562


   https://issues.apache.org/jira/browse/ARROW-10040


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#discussion_r501349134



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -371,118 +388,165 @@ where
 
 fn bitwise_bin_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
+    left_offset_in_bits: usize,
     right: &Buffer,
-    right_offset: usize,
-    len: usize,
+    right_offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8, u8) -> u8,
+    F: Fn(u64, u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(
-            left.data()[left_offset..]
-                .iter()
-                .zip(right.data()[right_offset..].iter()),
-        )
+    let left_chunks = left.bit_chunks(left_offset_in_bits, len_in_bits);
+    let right_chunks = right.bit_chunks(right_offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter().zip(right_chunks.iter()))
         .for_each(|(res, (left, right))| {
-            *res = op(*left, *right);
+            *res = op(left, right);
         });
 
+    let remainder_bytes = ceil(left_chunks.remainder_len(), 8);
+    let rem = op(left_chunks.remainder_bits(), right_chunks.remainder_bits());
+    let rem = &rem.to_le_bytes()[0..remainder_bytes];
+    result
+        .write_all(rem)
+        .expect("not enough capacity in buffer");
+
     result.freeze()
 }
 
 fn bitwise_unary_op_helper<F>(
     left: &Buffer,
-    left_offset: usize,
-    len: usize,
+    offset_in_bits: usize,
+    len_in_bits: usize,
     op: F,
 ) -> Buffer
 where
-    F: Fn(u8) -> u8,
+    F: Fn(u64) -> u64,
 {
-    let mut result = MutableBuffer::new(len).with_bitset(len, false);
+    // reserve capacity and set length so we can get a typed view of u64 chunks
+    let mut result =
+        MutableBuffer::new(ceil(len_in_bits, 8)).with_bitset(len_in_bits / 64 * 8, false);
 
-    result
-        .data_mut()
-        .iter_mut()
-        .zip(left.data()[left_offset..].iter())
+    let left_chunks = left.bit_chunks(offset_in_bits, len_in_bits);
+    let result_chunks = result.typed_data_mut::<u64>().iter_mut();
+
+    result_chunks
+        .zip(left_chunks.iter())
         .for_each(|(res, left)| {
-            *res = op(*left);
+            *res = op(left);
         });
 
+    let remainder_bytes = ceil(left_chunks.remainder_len(), 8);
+    let rem = op(left_chunks.remainder_bits());
+    let rem = &rem.to_le_bytes()[0..remainder_bytes];

Review comment:
       We always use `to_le_bytes`, so I think it's fine. We don't yet support BE




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#discussion_r501270867



##########
File path: rust/arrow/src/util/bit_chunk_iterator.rs
##########
@@ -0,0 +1,223 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+use crate::buffer::Buffer;
+use crate::util::bit_util::ceil;
+use std::fmt::Debug;
+
+#[derive(Debug)]
+pub struct BitChunks<'a> {
+    buffer: &'a Buffer,
+    raw_data: *const u8,
+    offset: usize,
+    chunk_len: usize,
+    remainder_len: usize,
+}
+
+impl<'a> BitChunks<'a> {
+    pub fn new(buffer: &'a Buffer, offset: usize, len: usize) -> Self {
+        assert!(ceil(offset + len, 8) <= buffer.len() * 8);
+
+        let byte_offset = offset / 8;
+        let offset = offset % 8;
+
+        let raw_data = unsafe { buffer.raw_data().add(byte_offset) };
+
+        let chunk_bits = 64;
+
+        let chunk_len = len / chunk_bits;
+        let remainder_len = len & (chunk_bits - 1);
+
+        BitChunks::<'a> {
+            buffer: &buffer,
+            raw_data,
+            offset,
+            chunk_len,
+            remainder_len,
+        }
+    }
+}
+
+#[derive(Debug)]
+pub struct BitChunkIterator<'a> {
+    buffer: &'a Buffer,
+    raw_data: *const u8,
+    offset: usize,
+    chunk_len: usize,
+    index: usize,
+}
+
+impl<'a> BitChunks<'a> {
+    #[inline]
+    pub fn remainder_len(&self) -> usize {
+        self.remainder_len
+    }
+
+    #[inline]
+    pub fn remainder_bits(&self) -> u64 {
+        let bit_len = self.remainder_len;
+        if bit_len == 0 {
+            0
+        } else {
+            let byte_len = ceil(bit_len, 8);
+
+            let mut bits = 0;
+            for i in 0..byte_len {
+                let byte = unsafe {
+                    std::ptr::read(
+                        self.raw_data
+                            .add(self.chunk_len * std::mem::size_of::<u64>() + i),
+                    )
+                };
+                bits |= (byte as u64) << (i * 8);
+            }
+
+            let offset = self.offset as u64;
+
+            (bits >> offset) & ((1 << bit_len) - 1)
+        }
+    }
+
+    #[inline]
+    pub fn iter(&self) -> BitChunkIterator<'a> {
+        BitChunkIterator::<'a> {
+            buffer: self.buffer,
+            raw_data: self.raw_data,
+            offset: self.offset,
+            chunk_len: self.chunk_len,
+            index: 0,
+        }
+    }
+}
+
+impl<'a> IntoIterator for BitChunks<'a> {
+    type Item = u64;
+    type IntoIter = BitChunkIterator<'a>;
+
+    fn into_iter(self) -> Self::IntoIter {
+        self.iter()
+    }
+}
+
+impl Iterator for BitChunkIterator<'_> {
+    type Item = u64;
+
+    #[inline]
+    fn next(&mut self) -> Option<u64> {
+        if self.index >= self.chunk_len {
+            return None;
+        }
+
+        // cast to *const u64 should be fine since we are using read_unaligned
+        #[allow(clippy::cast_ptr_alignment)]
+        let current = unsafe {
+            std::ptr::read_unaligned((self.raw_data as *const u64).add(self.index))
+        };
+
+        let combined = if self.offset == 0 {
+            current
+        } else {
+            // cast to *const u64 should be fine since we are using read_unaligned
+            #[allow(clippy::cast_ptr_alignment)]
+            let next = unsafe {
+                std::ptr::read_unaligned(
+                    (self.raw_data as *const u64).add(self.index + 1),
+                )
+            };
+            current >> self.offset
+                | (next & ((1 << self.offset) - 1)) << (64 - self.offset)
+        };
+
+        self.index += 1;
+
+        Some(combined)
+    }
+
+    #[inline]
+    fn size_hint(&self) -> (usize, Option<usize>) {
+        (
+            self.chunk_len - self.index,
+            Some(self.chunk_len - self.index),
+        )
+    }
+}
+
+impl ExactSizeIterator for BitChunkIterator<'_> {
+    #[inline]
+    fn len(&self) -> usize {
+        self.chunk_len - self.index
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use crate::buffer::Buffer;
+
+    #[test]
+    fn test_iter_aligned() {

Review comment:
       Endianness also confuses me, but I think this test is correct. It might have been clearer if I had also written it using binary literals.
   
   We are counting bits starting from the least significant, since the data is using bytes it should not matter whether it's big- or little-endian. So bit 0 is the least significant bit in the first byte, bit 8 is the least significant in the second byte. The order in the u64 should be the same, so bit 8, starting from the least significant on the right is the 1 one from the second input byte. I hope this makes sense.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#issuecomment-704548277


   @nevi-me @jorgecarleitao I removed the draft status of this PR since I finally managed to make use of this refactoring in an aggregation kernel (#8370). The performance there looks really nice. I'm still open for suggestions about moving these bit-oriented methods maybe out of `Buffer`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#issuecomment-705176180


   I ran some benchmarks and it seems the simd versions of `bitwise_bin_op` etc are only about 20% to 50% faster than the iterator based ones. I think it would be an acceptable tradeoff to remove them to simplify the code.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8262: ARROW-10040: [Rust] Iterate over and combine boolean buffers with arbitrary offsets

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8262:
URL: https://github.com/apache/arrow/pull/8262#issuecomment-705064977


   Thanks for the PR, @jhorstmann . I will defer to others as bitmaps and offsets are beyond my skills atm.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org