You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/01 13:51:31 UTC

[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #1228: Faster bitmask iteration

jhorstmann commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r796614899



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       Tests are much nicer to read now, thanks. My one concern is that the actual `UnalignedBitChunkIterator` iterator is a bit difficult to use, because it requires a bit of special handling for the first element, which has to be shifted by `lead_padding`. I played around with a similar design, and one idea is to return an iterator of `(start_offset, mask, len)` tuples.
   
   Example: Bitmap consisting of 130 one bits, starting at offset 60, the iterator would return
   ```
   (0, 0b1111, 4)
   (4, u64::MAX, 64)
   (68, 0b11, 2)
   ```
   
   Users would then need no special logic for prefix/suffix. When iterating over all set bits you could use `trailing_zeros` on each chunk and add the `start_offset`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org