You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/23 14:39:41 UTC

[GitHub] [arrow-rs] tustvold opened a new pull request #1228: Unaligned bit chunk

tustvold opened a new pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228


   **Draft as builds on #1225** 
   
   # Which issue does this PR close?
   
   Closes #1227.
   
   # Rationale for this change
    
   This improves the filter benchmarks by a factor of 2x, and likely will have similar benefits elsewhere
   
   ```
   filter u8               time:   [140.44 us 140.61 us 140.76 us]                      
                           change: [-51.558% -51.392% -51.226%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter u8 high selectivity                                                                             
                           time:   [2.4576 us 2.4587 us 2.4601 us]
                           change: [-53.091% -52.977% -52.863%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter u8 low selectivity                                                                             
                           time:   [1.6956 us 1.6981 us 1.7010 us]
                           change: [-60.543% -60.284% -59.829%] (p = 0.00 < 0.05)
                           Performance has improved.
                           
   filter f32              time:   [418.84 us 419.45 us 420.15 us]                       
                           change: [-26.414% -26.286% -26.141%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter single record batch                                                                            
                           time:   [183.96 us 188.44 us 193.16 us]
                           change: [-33.234% -31.549% -29.597%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   ```
   Note: the filter context benchmarks don't see a signficant benefit as the filter is computed outside the benchmark body.
   
   # What changes are included in this PR?
   
   This adds an UnalignedBitChunkIterator and updates SlicesIterator to use it
   
   # Are there any user-facing changes?
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#issuecomment-1028336590


   Thanks @tustvold  and @jhorstmann 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r795042188



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       I've updated these tests to hopefully be clearer, let me know if that helps




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#issuecomment-1019500860


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1228](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f809d33) into [master](https://codecov.io/gh/apache/arrow-rs/commit/aa71aeaa3d6f0345690490588640226701e1ac15?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (aa71aea) will **increase** coverage by `0.03%`.
   > The diff coverage is `93.91%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/1228/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1228      +/-   ##
   ==========================================
   + Coverage   82.96%   83.00%   +0.03%     
   ==========================================
     Files         178      178              
     Lines       51522    51690     +168     
   ==========================================
   + Hits        42744    42904     +160     
   - Misses       8778     8786       +8     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/util/bit\_chunk\_iterator.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL3V0aWwvYml0X2NodW5rX2l0ZXJhdG9yLnJz) | `93.47% <92.43%> (-4.33%)` | :arrow_down: |
   | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `92.13% <96.77%> (+2.57%)` | :arrow_up: |
   | [arrow/src/buffer/immutable.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2J1ZmZlci9pbW11dGFibGUucnM=) | `99.45% <100.00%> (+0.52%)` | :arrow_up: |
   | [parquet/src/arrow/bit\_util.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYml0X3V0aWwucnM=) | `100.00% <100.00%> (ø)` | |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.38% <0.00%> (-0.43%)` | :arrow_down: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `84.64% <0.00%> (-0.13%)` | :arrow_down: |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.21% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [aa71aea...f809d33](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r796904009



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       They're currently in a crate private module in parquet somewhat intentionally. I'll have a play with the iterator approach you propose and see if it has an impact on perf




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #1228: Unaligned bit chunk

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#issuecomment-1019500860


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1228](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c14f8c5) into [master](https://codecov.io/gh/apache/arrow-rs/commit/aa71aeaa3d6f0345690490588640226701e1ac15?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (aa71aea) will **increase** coverage by `0.02%`.
   > The diff coverage is `93.19%`.
   
   > :exclamation: Current head c14f8c5 differs from pull request most recent head 6c1bb85. Consider uploading reports for the commit 6c1bb85 to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/1228/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1228      +/-   ##
   ==========================================
   + Coverage   82.96%   82.99%   +0.02%     
   ==========================================
     Files         178      178              
     Lines       51522    51662     +140     
   ==========================================
   + Hits        42744    42875     +131     
   - Misses       8778     8787       +9     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/util/bit\_chunk\_iterator.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL3V0aWwvYml0X2NodW5rX2l0ZXJhdG9yLnJz) | `93.33% <92.17%> (-4.47%)` | :arrow_down: |
   | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `91.51% <95.00%> (+1.96%)` | :arrow_up: |
   | [arrow/src/buffer/immutable.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2J1ZmZlci9pbW11dGFibGUucnM=) | `99.45% <100.00%> (+0.52%)` | :arrow_up: |
   | [parquet/src/arrow/bit\_util.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYml0X3V0aWwucnM=) | `100.00% <100.00%> (ø)` | |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.38% <0.00%> (-0.43%)` | :arrow_down: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `84.51% <0.00%> (-0.26%)` | :arrow_down: |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.21% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [aa71aea...6c1bb85](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r795044493



##########
File path: parquet/src/arrow/bit_util.rs
##########
@@ -15,40 +15,33 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use arrow::util::bit_chunk_iterator::BitChunks;
+use arrow::util::bit_chunk_iterator::UnalignedBitChunk;
 use std::ops::Range;
 
 /// Counts the number of set bits in the provided range
 pub fn count_set_bits(bytes: &[u8], range: Range<usize>) -> usize {
-    let mut count = 0_usize;
-    let chunks = BitChunks::new(bytes, range.start, range.end - range.start);
-    chunks.iter().for_each(|chunk| {
-        count += chunk.count_ones() as usize;
-    });
-    count += chunks.remainder_bits().count_ones() as usize;
-    count
+    let unaligned = UnalignedBitChunk::new(bytes, range.start, range.end - range.start);

Review comment:
       The slice of bits being iterated on isn't even byte aligned, as range is in bits not bytes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb merged pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r796673841



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -17,184 +17,117 @@
 
 //! Defines miscellaneous array kernels.
 
+use crate::array::*;
 use crate::buffer::buffer_bin_and;
 use crate::datatypes::DataType;
 use crate::error::Result;
 use crate::record_batch::RecordBatch;
-use crate::{array::*, util::bit_chunk_iterator::BitChunkIterator};
-use std::iter::Enumerate;
+use crate::util::bit_chunk_iterator::{UnalignedBitChunk, UnalignedBitChunkIterator};
 
 /// Function that can filter arbitrary arrays
 pub type Filter<'a> = Box<dyn Fn(&ArrayData) -> ArrayData + 'a>;
 
-/// Internal state of [SlicesIterator]
-#[derive(Debug, PartialEq)]
-enum State {
-    // it is iterating over bits of a mask (`u64`, steps of size of 1 slot)
-    Bits(u64),
-    // it is iterating over chunks (steps of size of 64 slots)
-    Chunks,
-    // it is iterating over the remaining bits (steps of size of 1 slot)
-    Remainder,
-    // nothing more to iterate.
-    Finish,
-}
-
 /// An iterator of `(usize, usize)` each representing an interval `[start,end[` whose
 /// slots of a [BooleanArray] are true. Each interval corresponds to a contiguous region of memory to be
 /// "taken" from an array to be filtered.
 #[derive(Debug)]
 pub struct SlicesIterator<'a> {
-    iter: Enumerate<BitChunkIterator<'a>>,
-    state: State,
-    filter: &'a BooleanArray,
-    remainder_mask: u64,
-    remainder_len: usize,
-    chunk_len: usize,
+    iter: UnalignedBitChunkIterator<'a>,
     len: usize,
-    start: usize,
-    on_region: bool,
-    current_chunk: usize,
-    current_bit: usize,
+    offset: usize,
+    current_chunk: u64,
 }
 
 impl<'a> SlicesIterator<'a> {
     pub fn new(filter: &'a BooleanArray) -> Self {
         let values = &filter.data_ref().buffers()[0];
-        let chunks = values.bit_chunks(filter.offset(), filter.len());
+        let len = filter.len();
+        let chunk = UnalignedBitChunk::new(values.as_slice(), filter.offset(), len);
+        let mut iter = chunk.iter();
+
+        let offset = chunk.lead_padding();
+        let current_chunk = iter.next().unwrap_or(0);
 
         Self {
-            iter: chunks.iter().enumerate(),
-            state: State::Chunks,
-            filter,
-            remainder_len: chunks.remainder_len(),
-            chunk_len: chunks.chunk_len(),
-            remainder_mask: chunks.remainder_bits(),
-            len: 0,
-            start: 0,
-            on_region: false,
-            current_chunk: 0,
-            current_bit: 0,
+            iter,
+            len,
+            offset,
+            current_chunk,
         }
     }
 
-    /// Counts the number of set bits in the filter array.
-    fn filter_count(&self) -> usize {
-        let values = self.filter.values();
-        values.count_set_bits_offset(self.filter.offset(), self.filter.len())
-    }
-
-    #[inline]
-    fn current_start(&self) -> usize {
-        self.current_chunk * 64 + self.current_bit
-    }
-
-    #[inline]
-    fn iterate_bits(&mut self, mask: u64, max: usize) -> Option<(usize, usize)> {
-        while self.current_bit < max {
-            if (mask & (1 << self.current_bit)) != 0 {
-                if !self.on_region {
-                    self.start = self.current_start();
-                    self.on_region = true;
-                }
-                self.len += 1;
-            } else if self.on_region {
-                let result = (self.start, self.start + self.len);
-                self.len = 0;
-                self.on_region = false;
-                self.current_bit += 1;
-                return Some(result);
+    fn advance_to_set_bit(&mut self) -> Option<(usize, u32)> {

Review comment:
       Done - was not an intentional omission




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #1228: Unaligned bit chunk

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#issuecomment-1019500860


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1228](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7959411) into [master](https://codecov.io/gh/apache/arrow-rs/commit/fcd37ee1f8a8c3f5af099419aa9667bf0a515595?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (fcd37ee) will **increase** coverage by `0.03%`.
   > The diff coverage is `93.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/1228/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1228      +/-   ##
   ==========================================
   + Coverage   82.70%   82.74%   +0.03%     
   ==========================================
     Files         175      175              
     Lines       51711    51854     +143     
   ==========================================
   + Hits        42769    42906     +137     
   - Misses       8942     8948       +6     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/util/bit\_chunk\_iterator.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL3V0aWwvYml0X2NodW5rX2l0ZXJhdG9yLnJz) | `93.33% <92.17%> (-4.47%)` | :arrow_down: |
   | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `91.51% <95.00%> (+1.96%)` | :arrow_up: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `85.47% <100.00%> (-0.10%)` | :arrow_down: |
   | [arrow/src/buffer/immutable.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2J1ZmZlci9pbW11dGFibGUucnM=) | `99.45% <100.00%> (+0.52%)` | :arrow_up: |
   | [...rquet/src/arrow/record\_reader/definition\_levels.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvcmVjb3JkX3JlYWRlci9kZWZpbml0aW9uX2xldmVscy5ycw==) | `88.32% <100.00%> (-0.18%)` | :arrow_down: |
   | [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `93.52% <0.00%> (-0.20%)` | :arrow_down: |
   | [arrow/src/csv/reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2Nzdi9yZWFkZXIucnM=) | `88.12% <0.00%> (+0.01%)` | :arrow_up: |
   | [arrow/src/datatypes/field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9maWVsZC5ycw==) | `54.10% <0.00%> (+0.30%)` | :arrow_up: |
   | ... and [1 more](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [fcd37ee...7959411](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on pull request #1228: Unaligned bit chunk

Posted by GitBox <gi...@apache.org>.
tustvold commented on pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#issuecomment-1019939399


   In order to test how much of the performance uplift was the changes to SlicesIterator and how much UnalignedChunk I created a branch with the changes to SlicesIterator but using BitChunks instead of UnalignedBitChunks.
   
   #1225 
   
   ```
   filter u8              
                           time:   [289.51 us 289.72 us 289.93 us]                      
   
   filter u8 high selectivity                                                                             
                           time:   [5.2759 us 5.2786 us 5.2819 us]
   
   filter u8 low selectivity                                                                             
                           time:   [3.8342 us 3.8385 us 3.8453 us]
   ```
   
   Improved SlicesIterator only
   
   ```
   filter u8              
                           time:   [175.13 us 175.16 us 175.20 us]                      
   
   filter u8 high selectivity                                                                             
                           time:   [2.5869 us 2.5880 us 2.5892 us]
   
   filter u8 low selectivity                                                                             
                           time:   [1.9370 us 1.9388 us 1.9407 us]
   ```
   
   This PR
   
   ```
   filter u8               
                           time:   [140.52 us 140.55 us 140.58 us]                      
   
   filter u8 high selectivity                                                                             
                           time:   [2.5015 us 2.5023 us 2.5033 us]
   
   filter u8 low selectivity                                                                             
                           time:   [1.7586 us 1.7591 us 1.7597 us]
   ```
   
   So the UnalignedBitChunks does yield a non-negligible performance benefit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r797007256



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       So I ran into challenges making a reversible `UnalignedBitChunkIterator` which is necessary for `iter_set_bits_rev`, which actually led me to a simpler solution - just implement `iter_set_bits_rev` in terms of `UnalignedBitChunk` and make the iterator crate private. Let me know what you think of this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r797091939



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on pull request #1228: Unaligned bit chunk

Posted by GitBox <gi...@apache.org>.
tustvold commented on pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#issuecomment-1024533037


   I've rebased this to not be on top of #1225 as it is pending some more thought, without that fix this makes a smaller delta but still not insignificant
   
   ```
   filter u8               time:   [359.76 us 359.84 us 359.93 us]                      
                           change: [-27.199% -27.161% -27.119%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter u8 high selectivity                                                                             
                           time:   [8.8616 us 8.9026 us 8.9439 us]
                           change: [-29.670% -29.463% -29.275%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter u8 low selectivity                                                                             
                           time:   [2.4243 us 2.4249 us 2.4254 us]
                           change: [-45.812% -45.627% -45.478%] (p = 0.00 < 0.05)
                           Performance has improved.
   ```
   
   Perhaps more importantly I hope to use this to implement optimized filter kernels


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r796655122



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -17,184 +17,117 @@
 
 //! Defines miscellaneous array kernels.
 
+use crate::array::*;
 use crate::buffer::buffer_bin_and;
 use crate::datatypes::DataType;
 use crate::error::Result;
 use crate::record_batch::RecordBatch;
-use crate::{array::*, util::bit_chunk_iterator::BitChunkIterator};
-use std::iter::Enumerate;
+use crate::util::bit_chunk_iterator::{UnalignedBitChunk, UnalignedBitChunkIterator};
 
 /// Function that can filter arbitrary arrays
 pub type Filter<'a> = Box<dyn Fn(&ArrayData) -> ArrayData + 'a>;
 
-/// Internal state of [SlicesIterator]
-#[derive(Debug, PartialEq)]
-enum State {
-    // it is iterating over bits of a mask (`u64`, steps of size of 1 slot)
-    Bits(u64),
-    // it is iterating over chunks (steps of size of 64 slots)
-    Chunks,
-    // it is iterating over the remaining bits (steps of size of 1 slot)
-    Remainder,
-    // nothing more to iterate.
-    Finish,
-}
-
 /// An iterator of `(usize, usize)` each representing an interval `[start,end[` whose
 /// slots of a [BooleanArray] are true. Each interval corresponds to a contiguous region of memory to be
 /// "taken" from an array to be filtered.
 #[derive(Debug)]
 pub struct SlicesIterator<'a> {
-    iter: Enumerate<BitChunkIterator<'a>>,
-    state: State,
-    filter: &'a BooleanArray,
-    remainder_mask: u64,
-    remainder_len: usize,
-    chunk_len: usize,
+    iter: UnalignedBitChunkIterator<'a>,
     len: usize,
-    start: usize,
-    on_region: bool,
-    current_chunk: usize,
-    current_bit: usize,
+    offset: usize,
+    current_chunk: u64,
 }
 
 impl<'a> SlicesIterator<'a> {
     pub fn new(filter: &'a BooleanArray) -> Self {
         let values = &filter.data_ref().buffers()[0];
-        let chunks = values.bit_chunks(filter.offset(), filter.len());
+        let len = filter.len();
+        let chunk = UnalignedBitChunk::new(values.as_slice(), filter.offset(), len);
+        let mut iter = chunk.iter();
+
+        let offset = chunk.lead_padding();
+        let current_chunk = iter.next().unwrap_or(0);
 
         Self {
-            iter: chunks.iter().enumerate(),
-            state: State::Chunks,
-            filter,
-            remainder_len: chunks.remainder_len(),
-            chunk_len: chunks.chunk_len(),
-            remainder_mask: chunks.remainder_bits(),
-            len: 0,
-            start: 0,
-            on_region: false,
-            current_chunk: 0,
-            current_bit: 0,
+            iter,
+            len,
+            offset,
+            current_chunk,
         }
     }
 
-    /// Counts the number of set bits in the filter array.
-    fn filter_count(&self) -> usize {
-        let values = self.filter.values();
-        values.count_set_bits_offset(self.filter.offset(), self.filter.len())
-    }
-
-    #[inline]
-    fn current_start(&self) -> usize {
-        self.current_chunk * 64 + self.current_bit
-    }
-
-    #[inline]
-    fn iterate_bits(&mut self, mask: u64, max: usize) -> Option<(usize, usize)> {
-        while self.current_bit < max {
-            if (mask & (1 << self.current_bit)) != 0 {
-                if !self.on_region {
-                    self.start = self.current_start();
-                    self.on_region = true;
-                }
-                self.len += 1;
-            } else if self.on_region {
-                let result = (self.start, self.start + self.len);
-                self.len = 0;
-                self.on_region = false;
-                self.current_bit += 1;
-                return Some(result);
+    fn advance_to_set_bit(&mut self) -> Option<(usize, u32)> {

Review comment:
       What do you think about adding this documentation?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r796667308



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -693,4 +630,61 @@ mod tests {
         assert_eq!(out.data_type(), &DataType::Int64);
         Ok(())
     }
+

Review comment:
       I wrote another test to convince myself this one was correct and I think it also helps serve as a test:
   
   ```suggestion
   
       #[test]
       fn test_slices() {
           // takes up 2 u64s
           let bools = std::iter::repeat(true).take(10)
               .chain(std::iter::repeat(false).take(30))
               .chain(std::iter::repeat(true).take(20))
               .chain(std::iter::repeat(false).take(17))
               .chain(std::iter::repeat(true).take(4));
   
           let bool_array: BooleanArray = bools.map(Some).collect();
   
           let slices: Vec<_> = SlicesIterator::new(&bool_array).collect();
           let expected = vec![(0, 10), (40, 60), (77, 81)];
           assert_eq!(slices, expected);
   
           // slice with offset and truncated len
           let len = bool_array.len();
           let sliced_array = bool_array.slice(7, len-10);
           let sliced_array = sliced_array
               .as_any()
               .downcast_ref::<BooleanArray>()
               .unwrap();
           let slices: Vec<_> = SlicesIterator::new(&sliced_array).collect();
           let expected = vec![(0, 3), (33, 53), (70, 71)];
           assert_eq!(slices, expected);
       }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r796856635



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       I just realized, it is being used in the parquet crate...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Unaligned bit chunk

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r790285853



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -17,184 +17,117 @@
 
 //! Defines miscellaneous array kernels.
 
+use crate::array::*;
 use crate::buffer::buffer_bin_and;
 use crate::datatypes::DataType;
 use crate::error::Result;
 use crate::record_batch::RecordBatch;
-use crate::{array::*, util::bit_chunk_iterator::BitChunkIterator};
-use std::iter::Enumerate;
+use crate::util::bit_chunk_iterator::{UnalignedBitChunk, UnalignedBitChunkIterator};
 
 /// Function that can filter arbitrary arrays
 pub type Filter<'a> = Box<dyn Fn(&ArrayData) -> ArrayData + 'a>;
 
-/// Internal state of [SlicesIterator]
-#[derive(Debug, PartialEq)]
-enum State {
-    // it is iterating over bits of a mask (`u64`, steps of size of 1 slot)
-    Bits(u64),
-    // it is iterating over chunks (steps of size of 64 slots)
-    Chunks,
-    // it is iterating over the remaining bits (steps of size of 1 slot)
-    Remainder,
-    // nothing more to iterate.
-    Finish,
-}
-
 /// An iterator of `(usize, usize)` each representing an interval `[start,end[` whose
 /// slots of a [BooleanArray] are true. Each interval corresponds to a contiguous region of memory to be
 /// "taken" from an array to be filtered.
 #[derive(Debug)]
 pub struct SlicesIterator<'a> {
-    iter: Enumerate<BitChunkIterator<'a>>,
-    state: State,
-    filter: &'a BooleanArray,
-    remainder_mask: u64,
-    remainder_len: usize,
-    chunk_len: usize,
+    iter: UnalignedBitChunkIterator<'a>,
     len: usize,
-    start: usize,
-    on_region: bool,
-    current_chunk: usize,
-    current_bit: usize,
+    offset: usize,
+    current_chunk: u64,
 }
 
 impl<'a> SlicesIterator<'a> {
     pub fn new(filter: &'a BooleanArray) -> Self {
         let values = &filter.data_ref().buffers()[0];
-        let chunks = values.bit_chunks(filter.offset(), filter.len());
+        let len = filter.len();
+        let chunk = UnalignedBitChunk::new(values.as_slice(), filter.offset(), len);
+        let mut iter = chunk.iter();
+
+        let offset = chunk.lead_padding();
+        let current_chunk = iter.next().unwrap_or(0);
 
         Self {
-            iter: chunks.iter().enumerate(),
-            state: State::Chunks,
-            filter,
-            remainder_len: chunks.remainder_len(),
-            chunk_len: chunks.chunk_len(),
-            remainder_mask: chunks.remainder_bits(),
-            len: 0,
-            start: 0,
-            on_region: false,
-            current_chunk: 0,
-            current_bit: 0,
+            iter,
+            len,
+            offset,
+            current_chunk,
         }
     }
 
-    /// Counts the number of set bits in the filter array.
-    fn filter_count(&self) -> usize {
-        let values = self.filter.values();
-        values.count_set_bits_offset(self.filter.offset(), self.filter.len())
-    }
-
-    #[inline]
-    fn current_start(&self) -> usize {
-        self.current_chunk * 64 + self.current_bit
-    }
-
-    #[inline]
-    fn iterate_bits(&mut self, mask: u64, max: usize) -> Option<(usize, usize)> {
-        while self.current_bit < max {
-            if (mask & (1 << self.current_bit)) != 0 {
-                if !self.on_region {
-                    self.start = self.current_start();
-                    self.on_region = true;
-                }
-                self.len += 1;
-            } else if self.on_region {
-                let result = (self.start, self.start + self.len);
-                self.len = 0;
-                self.on_region = false;
-                self.current_bit += 1;
-                return Some(result);
+    fn advance_to_set_bit(&mut self) -> Option<(usize, u32)> {
+        loop {
+            if self.current_chunk != 0 {
+                // Find the index of the first 1
+                let bit_pos = self.current_chunk.trailing_zeros();

Review comment:
       Tbh this is likely the "trick" that is yielding most of the filter speedup. x86 and ARM both have specific instructions for this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r795042188



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       I've updated these tests to hopefully be clearer, let me know if that helps. 
   
   Edit: I'm looking into SlicesIterator now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r795038980



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       These tests might be easier to understand by asserting against a binary literal.
   
   I added a bit of debug output locally and I'm not sure whether this should be the expected output:
   
   ```
   eprintln!("{:064b}", unaligned.prefix().unwrap());
   
   // output: 0000000000000000000000000000000011111111111111111111111111100000 
   ```
   
   Would have expected the prefix chunk to not have trailing zeroes, and instead have a length less than 64 bits. That does not make a difference when counting bits, but I find it a bit confusing.
   
   I don't yet understand how the current behavior interacts in the `advance_to_set_bit` function. The way I understand it, `offset` would be 5 for this example, then `trailing_zeros` would also be 5, and then in `SlicesIterator` `start_chunk + start_bit` would be 10. But I'm probably missing something there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#issuecomment-1019500860


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1228](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4cab321) into [master](https://codecov.io/gh/apache/arrow-rs/commit/aa71aeaa3d6f0345690490588640226701e1ac15?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (aa71aea) will **increase** coverage by `0.02%`.
   > The diff coverage is `93.24%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/1228/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1228      +/-   ##
   ==========================================
   + Coverage   82.96%   82.99%   +0.02%     
   ==========================================
     Files         178      178              
     Lines       51522    51664     +142     
   ==========================================
   + Hits        42744    42877     +133     
   - Misses       8778     8787       +9     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/util/bit\_chunk\_iterator.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL3V0aWwvYml0X2NodW5rX2l0ZXJhdG9yLnJz) | `93.38% <92.26%> (-4.42%)` | :arrow_down: |
   | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `91.51% <95.00%> (+1.96%)` | :arrow_up: |
   | [arrow/src/buffer/immutable.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2J1ZmZlci9pbW11dGFibGUucnM=) | `99.45% <100.00%> (+0.52%)` | :arrow_up: |
   | [parquet/src/arrow/bit\_util.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYml0X3V0aWwucnM=) | `100.00% <100.00%> (ø)` | |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.38% <0.00%> (-0.43%)` | :arrow_down: |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `65.98% <0.00%> (-0.23%)` | :arrow_down: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1228/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `84.64% <0.00%> (-0.13%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [aa71aea...4cab321](https://codecov.io/gh/apache/arrow-rs/pull/1228?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r796855529



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       Marking it internal for now sounds good, I'm ok with merging it then. The performance improvements are really nice, and we can probably find more use cases for it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r796614899



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       Tests are much nicer to read now, thanks. My one concern is that the actual `UnalignedBitChunkIterator` iterator is a bit difficult to use, because it requires a bit of special handling for the first element, which has to be shifted by `lead_padding`. I played around with a similar design, and one idea is to return an iterator of `(start_offset, mask, len)` tuples.
   
   Example: Bitmap consisting of 130 one bits, starting at offset 60, the iterator would return
   ```
   (0, 0b1111, 4)
   (4, u64::MAX, 64)
   (68, 0b11, 2)
   ```
   
   Users would then need no special logic for prefix/suffix. When iterating over all set bits you could use `trailing_zeros` on each chunk and add the `start_offset`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r796898102



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       Would moving the `count_set_bits` and `iter_set_bits_rev` functions to the arrow crate be an option, and then hide the `UnalignedBitChunk` as their implementation detail? I think they were added after the last release, so that would not even be a breaking change. On the other hand, `iter_set_bits_rev` seems very specific to the parquet usecase.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r797076217



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       Sounds good. For the `SlicesIterator` the current definition of `prefix`/`lead_padding` seems to work better than my idea above. If we want to change the behavior at some later point we could rename those methods in a major release. Behavior change would have been bad with a public `iter` method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r795043370



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -17,184 +17,117 @@
 
 //! Defines miscellaneous array kernels.
 
+use crate::array::*;
 use crate::buffer::buffer_bin_and;
 use crate::datatypes::DataType;
 use crate::error::Result;
 use crate::record_batch::RecordBatch;
-use crate::{array::*, util::bit_chunk_iterator::BitChunkIterator};
-use std::iter::Enumerate;
+use crate::util::bit_chunk_iterator::{UnalignedBitChunk, UnalignedBitChunkIterator};
 
 /// Function that can filter arbitrary arrays
 pub type Filter<'a> = Box<dyn Fn(&ArrayData) -> ArrayData + 'a>;
 
-/// Internal state of [SlicesIterator]
-#[derive(Debug, PartialEq)]
-enum State {
-    // it is iterating over bits of a mask (`u64`, steps of size of 1 slot)
-    Bits(u64),
-    // it is iterating over chunks (steps of size of 64 slots)
-    Chunks,
-    // it is iterating over the remaining bits (steps of size of 1 slot)
-    Remainder,
-    // nothing more to iterate.
-    Finish,
-}
-
 /// An iterator of `(usize, usize)` each representing an interval `[start,end[` whose
 /// slots of a [BooleanArray] are true. Each interval corresponds to a contiguous region of memory to be
 /// "taken" from an array to be filtered.
 #[derive(Debug)]
 pub struct SlicesIterator<'a> {
-    iter: Enumerate<BitChunkIterator<'a>>,
-    state: State,
-    filter: &'a BooleanArray,
-    remainder_mask: u64,
-    remainder_len: usize,
-    chunk_len: usize,
+    iter: UnalignedBitChunkIterator<'a>,
     len: usize,
-    start: usize,
-    on_region: bool,
-    current_chunk: usize,
-    current_bit: usize,
+    offset: usize,
+    current_chunk: u64,
 }
 
 impl<'a> SlicesIterator<'a> {
     pub fn new(filter: &'a BooleanArray) -> Self {
         let values = &filter.data_ref().buffers()[0];
-        let chunks = values.bit_chunks(filter.offset(), filter.len());
+        let len = filter.len();
+        let chunk = UnalignedBitChunk::new(values.as_slice(), filter.offset(), len);
+        let mut iter = chunk.iter();
+
+        let offset = chunk.lead_padding();
+        let current_chunk = iter.next().unwrap_or(0);
 
         Self {
-            iter: chunks.iter().enumerate(),
-            state: State::Chunks,
-            filter,
-            remainder_len: chunks.remainder_len(),
-            chunk_len: chunks.chunk_len(),
-            remainder_mask: chunks.remainder_bits(),
-            len: 0,
-            start: 0,
-            on_region: false,
-            current_chunk: 0,
-            current_bit: 0,
+            iter,
+            len,
+            offset,
+            current_chunk,
         }
     }
 
-    /// Counts the number of set bits in the filter array.
-    fn filter_count(&self) -> usize {
-        let values = self.filter.values();
-        values.count_set_bits_offset(self.filter.offset(), self.filter.len())
-    }
-
-    #[inline]
-    fn current_start(&self) -> usize {
-        self.current_chunk * 64 + self.current_bit
-    }
-
-    #[inline]
-    fn iterate_bits(&mut self, mask: u64, max: usize) -> Option<(usize, usize)> {
-        while self.current_bit < max {
-            if (mask & (1 << self.current_bit)) != 0 {
-                if !self.on_region {
-                    self.start = self.current_start();
-                    self.on_region = true;
-                }
-                self.len += 1;
-            } else if self.on_region {
-                let result = (self.start, self.start + self.len);
-                self.len = 0;
-                self.on_region = false;
-                self.current_bit += 1;
-                return Some(result);
+    fn advance_to_set_bit(&mut self) -> Option<(usize, u32)> {
+        loop {
+            if self.current_chunk != 0 {
+                // Find the index of the first 1
+                let bit_pos = self.current_chunk.trailing_zeros();
+                return Some((self.offset, bit_pos));
             }
-            self.current_bit += 1;
-        }
-        self.current_bit = 0;
-        None
-    }
 
-    /// iterates over chunks.
-    #[inline]
-    fn iterate_chunks(&mut self) -> Option<(usize, usize)> {
-        while let Some((i, mask)) = self.iter.next() {
-            self.current_chunk = i;
-            if mask == 0 {
-                if self.on_region {
-                    let result = (self.start, self.start + self.len);
-                    self.len = 0;
-                    self.on_region = false;
-                    return Some(result);
-                }
-            } else if mask == 18446744073709551615u64 {
-                // = !0u64
-                if !self.on_region {
-                    self.start = self.current_start();
-                    self.on_region = true;
-                }
-                self.len += 64;
-            } else {
-                // there is a chunk that has a non-trivial mask => iterate over bits.
-                self.state = State::Bits(mask);
-                return None;
-            }
+            self.current_chunk = self.iter.next()?;
+            self.offset += 64;
         }
-        // no more chunks => start iterating over the remainder
-        self.current_chunk = self.chunk_len;
-        self.state = State::Remainder;
-        None
     }
 }
 
 impl<'a> Iterator for SlicesIterator<'a> {
     type Item = (usize, usize);
 
     fn next(&mut self) -> Option<Self::Item> {
-        match self.state {
-            State::Chunks => {
-                match self.iterate_chunks() {
-                    None => {
-                        // iterating over chunks does not yield any new slice => continue to the next
-                        self.current_bit = 0;
-                        self.next()
-                    }
-                    other => other,
-                }
+        // Used as termination condition
+        if self.len == 0 {
+            return None;
+        }
+
+        let (start_chunk, start_bit) = self.advance_to_set_bit()?;
+
+        // Set bits up to start
+        self.current_chunk |= (1 << start_bit) - 1;

Review comment:
       I think I understand what is going on here, but I am not 100% confident
   
   Would it be possible to write some unit tests of `SlicesIter` against `BooleanArray`s (I can help write them if you would like) that could demonstrate how the logic was working, and especially cover the corner cases (like enture `0xffffffff` and bits set within a mask?) 

##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -17,184 +17,117 @@
 
 //! Defines miscellaneous array kernels.
 
+use crate::array::*;
 use crate::buffer::buffer_bin_and;
 use crate::datatypes::DataType;
 use crate::error::Result;
 use crate::record_batch::RecordBatch;
-use crate::{array::*, util::bit_chunk_iterator::BitChunkIterator};
-use std::iter::Enumerate;
+use crate::util::bit_chunk_iterator::{UnalignedBitChunk, UnalignedBitChunkIterator};
 
 /// Function that can filter arbitrary arrays
 pub type Filter<'a> = Box<dyn Fn(&ArrayData) -> ArrayData + 'a>;
 
-/// Internal state of [SlicesIterator]
-#[derive(Debug, PartialEq)]
-enum State {
-    // it is iterating over bits of a mask (`u64`, steps of size of 1 slot)
-    Bits(u64),
-    // it is iterating over chunks (steps of size of 64 slots)
-    Chunks,
-    // it is iterating over the remaining bits (steps of size of 1 slot)
-    Remainder,
-    // nothing more to iterate.
-    Finish,
-}
-
 /// An iterator of `(usize, usize)` each representing an interval `[start,end[` whose
 /// slots of a [BooleanArray] are true. Each interval corresponds to a contiguous region of memory to be
 /// "taken" from an array to be filtered.
 #[derive(Debug)]
 pub struct SlicesIterator<'a> {
-    iter: Enumerate<BitChunkIterator<'a>>,
-    state: State,
-    filter: &'a BooleanArray,
-    remainder_mask: u64,
-    remainder_len: usize,
-    chunk_len: usize,
+    iter: UnalignedBitChunkIterator<'a>,
     len: usize,
-    start: usize,
-    on_region: bool,
-    current_chunk: usize,
-    current_bit: usize,
+    offset: usize,
+    current_chunk: u64,
 }
 
 impl<'a> SlicesIterator<'a> {
     pub fn new(filter: &'a BooleanArray) -> Self {
         let values = &filter.data_ref().buffers()[0];
-        let chunks = values.bit_chunks(filter.offset(), filter.len());
+        let len = filter.len();
+        let chunk = UnalignedBitChunk::new(values.as_slice(), filter.offset(), len);
+        let mut iter = chunk.iter();
+
+        let offset = chunk.lead_padding();
+        let current_chunk = iter.next().unwrap_or(0);
 
         Self {
-            iter: chunks.iter().enumerate(),
-            state: State::Chunks,
-            filter,
-            remainder_len: chunks.remainder_len(),
-            chunk_len: chunks.chunk_len(),
-            remainder_mask: chunks.remainder_bits(),
-            len: 0,
-            start: 0,
-            on_region: false,
-            current_chunk: 0,
-            current_bit: 0,
+            iter,
+            len,
+            offset,
+            current_chunk,
         }
     }
 
-    /// Counts the number of set bits in the filter array.
-    fn filter_count(&self) -> usize {
-        let values = self.filter.values();
-        values.count_set_bits_offset(self.filter.offset(), self.filter.len())
-    }
-
-    #[inline]
-    fn current_start(&self) -> usize {
-        self.current_chunk * 64 + self.current_bit
-    }
-
-    #[inline]
-    fn iterate_bits(&mut self, mask: u64, max: usize) -> Option<(usize, usize)> {
-        while self.current_bit < max {
-            if (mask & (1 << self.current_bit)) != 0 {
-                if !self.on_region {
-                    self.start = self.current_start();
-                    self.on_region = true;
-                }
-                self.len += 1;
-            } else if self.on_region {
-                let result = (self.start, self.start + self.len);
-                self.len = 0;
-                self.on_region = false;
-                self.current_bit += 1;
-                return Some(result);
+    fn advance_to_set_bit(&mut self) -> Option<(usize, u32)> {

Review comment:
       ```suggestion
       /// Returns `Some((chunk_offset, bit_offset))` for the next chunk that has at 
       /// least one but set, or None if there is no such chunk.
       /// 
       /// where `chunk_offset` is the bit offset to the current `usize`d chunk 
       /// and `bit_offset` is the offset of the first `1` bit in that chunk
       fn advance_to_set_bit(&mut self) -> Option<(usize, u32)> {
   ```

##########
File path: parquet/src/arrow/bit_util.rs
##########
@@ -15,40 +15,33 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use arrow::util::bit_chunk_iterator::BitChunks;
+use arrow::util::bit_chunk_iterator::UnalignedBitChunk;
 use std::ops::Range;
 
 /// Counts the number of set bits in the provided range
 pub fn count_set_bits(bytes: &[u8], range: Range<usize>) -> usize {
-    let mut count = 0_usize;
-    let chunks = BitChunks::new(bytes, range.start, range.end - range.start);
-    chunks.iter().for_each(|chunk| {
-        count += chunk.count_ones() as usize;
-    });
-    count += chunks.remainder_bits().count_ones() as usize;
-    count
+    let unaligned = UnalignedBitChunk::new(bytes, range.start, range.end - range.start);

Review comment:
       Just to check my understanding, you said above that `UnalignedBitChunk` was slower than `BitChunks` as it handled unaligned data; This function is dealing with input that *is* aligned, but only to `u8`,  so it is still better to use the `UnalignedBitChunk`iterator
   
   

##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -17,184 +17,117 @@
 
 //! Defines miscellaneous array kernels.
 
+use crate::array::*;
 use crate::buffer::buffer_bin_and;
 use crate::datatypes::DataType;
 use crate::error::Result;
 use crate::record_batch::RecordBatch;
-use crate::{array::*, util::bit_chunk_iterator::BitChunkIterator};
-use std::iter::Enumerate;
+use crate::util::bit_chunk_iterator::{UnalignedBitChunk, UnalignedBitChunkIterator};
 
 /// Function that can filter arbitrary arrays
 pub type Filter<'a> = Box<dyn Fn(&ArrayData) -> ArrayData + 'a>;
 
-/// Internal state of [SlicesIterator]
-#[derive(Debug, PartialEq)]
-enum State {
-    // it is iterating over bits of a mask (`u64`, steps of size of 1 slot)
-    Bits(u64),
-    // it is iterating over chunks (steps of size of 64 slots)
-    Chunks,
-    // it is iterating over the remaining bits (steps of size of 1 slot)
-    Remainder,
-    // nothing more to iterate.
-    Finish,
-}
-
 /// An iterator of `(usize, usize)` each representing an interval `[start,end[` whose
 /// slots of a [BooleanArray] are true. Each interval corresponds to a contiguous region of memory to be
 /// "taken" from an array to be filtered.
 #[derive(Debug)]
 pub struct SlicesIterator<'a> {
-    iter: Enumerate<BitChunkIterator<'a>>,
-    state: State,
-    filter: &'a BooleanArray,
-    remainder_mask: u64,
-    remainder_len: usize,
-    chunk_len: usize,
+    iter: UnalignedBitChunkIterator<'a>,
     len: usize,
-    start: usize,
-    on_region: bool,
-    current_chunk: usize,
-    current_bit: usize,
+    offset: usize,
+    current_chunk: u64,
 }
 
 impl<'a> SlicesIterator<'a> {
     pub fn new(filter: &'a BooleanArray) -> Self {
         let values = &filter.data_ref().buffers()[0];
-        let chunks = values.bit_chunks(filter.offset(), filter.len());
+        let len = filter.len();
+        let chunk = UnalignedBitChunk::new(values.as_slice(), filter.offset(), len);
+        let mut iter = chunk.iter();
+
+        let offset = chunk.lead_padding();
+        let current_chunk = iter.next().unwrap_or(0);
 
         Self {
-            iter: chunks.iter().enumerate(),
-            state: State::Chunks,
-            filter,
-            remainder_len: chunks.remainder_len(),
-            chunk_len: chunks.chunk_len(),
-            remainder_mask: chunks.remainder_bits(),
-            len: 0,
-            start: 0,
-            on_region: false,
-            current_chunk: 0,
-            current_bit: 0,
+            iter,
+            len,
+            offset,
+            current_chunk,
         }
     }
 
-    /// Counts the number of set bits in the filter array.
-    fn filter_count(&self) -> usize {
-        let values = self.filter.values();
-        values.count_set_bits_offset(self.filter.offset(), self.filter.len())
-    }
-
-    #[inline]
-    fn current_start(&self) -> usize {
-        self.current_chunk * 64 + self.current_bit
-    }
-
-    #[inline]
-    fn iterate_bits(&mut self, mask: u64, max: usize) -> Option<(usize, usize)> {
-        while self.current_bit < max {
-            if (mask & (1 << self.current_bit)) != 0 {
-                if !self.on_region {
-                    self.start = self.current_start();
-                    self.on_region = true;
-                }
-                self.len += 1;
-            } else if self.on_region {
-                let result = (self.start, self.start + self.len);
-                self.len = 0;
-                self.on_region = false;
-                self.current_bit += 1;
-                return Some(result);
+    fn advance_to_set_bit(&mut self) -> Option<(usize, u32)> {
+        loop {
+            if self.current_chunk != 0 {
+                // Find the index of the first 1
+                let bit_pos = self.current_chunk.trailing_zeros();

Review comment:
       TIL: https://doc.rust-lang.org/std/primitive.u64.html#method.trailing_zeros 🤔 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r795044493



##########
File path: parquet/src/arrow/bit_util.rs
##########
@@ -15,40 +15,33 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use arrow::util::bit_chunk_iterator::BitChunks;
+use arrow::util::bit_chunk_iterator::UnalignedBitChunk;
 use std::ops::Range;
 
 /// Counts the number of set bits in the provided range
 pub fn count_set_bits(bytes: &[u8], range: Range<usize>) -> usize {
-    let mut count = 0_usize;
-    let chunks = BitChunks::new(bytes, range.start, range.end - range.start);
-    chunks.iter().for_each(|chunk| {
-        count += chunk.count_ones() as usize;
-    });
-    count += chunks.remainder_bits().count_ones() as usize;
-    count
+    let unaligned = UnalignedBitChunk::new(bytes, range.start, range.end - range.start);

Review comment:
       The slice of bits being iterated on isn't even byte aligned, as range is in bits not bytes. The reason UnalignedBitChunk is faster here is it doesn't try to produce aligned output, which doesn't matter for counting set bits




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold edited a comment on pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold edited a comment on pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#issuecomment-1024895206


   I think SlicesIterator has a bug :disappointed: - fixing... It's applying the offset in the wrong direction :sweat_smile: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Unaligned bit chunk

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r790285853



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -17,184 +17,117 @@
 
 //! Defines miscellaneous array kernels.
 
+use crate::array::*;
 use crate::buffer::buffer_bin_and;
 use crate::datatypes::DataType;
 use crate::error::Result;
 use crate::record_batch::RecordBatch;
-use crate::{array::*, util::bit_chunk_iterator::BitChunkIterator};
-use std::iter::Enumerate;
+use crate::util::bit_chunk_iterator::{UnalignedBitChunk, UnalignedBitChunkIterator};
 
 /// Function that can filter arbitrary arrays
 pub type Filter<'a> = Box<dyn Fn(&ArrayData) -> ArrayData + 'a>;
 
-/// Internal state of [SlicesIterator]
-#[derive(Debug, PartialEq)]
-enum State {
-    // it is iterating over bits of a mask (`u64`, steps of size of 1 slot)
-    Bits(u64),
-    // it is iterating over chunks (steps of size of 64 slots)
-    Chunks,
-    // it is iterating over the remaining bits (steps of size of 1 slot)
-    Remainder,
-    // nothing more to iterate.
-    Finish,
-}
-
 /// An iterator of `(usize, usize)` each representing an interval `[start,end[` whose
 /// slots of a [BooleanArray] are true. Each interval corresponds to a contiguous region of memory to be
 /// "taken" from an array to be filtered.
 #[derive(Debug)]
 pub struct SlicesIterator<'a> {
-    iter: Enumerate<BitChunkIterator<'a>>,
-    state: State,
-    filter: &'a BooleanArray,
-    remainder_mask: u64,
-    remainder_len: usize,
-    chunk_len: usize,
+    iter: UnalignedBitChunkIterator<'a>,
     len: usize,
-    start: usize,
-    on_region: bool,
-    current_chunk: usize,
-    current_bit: usize,
+    offset: usize,
+    current_chunk: u64,
 }
 
 impl<'a> SlicesIterator<'a> {
     pub fn new(filter: &'a BooleanArray) -> Self {
         let values = &filter.data_ref().buffers()[0];
-        let chunks = values.bit_chunks(filter.offset(), filter.len());
+        let len = filter.len();
+        let chunk = UnalignedBitChunk::new(values.as_slice(), filter.offset(), len);
+        let mut iter = chunk.iter();
+
+        let offset = chunk.lead_padding();
+        let current_chunk = iter.next().unwrap_or(0);
 
         Self {
-            iter: chunks.iter().enumerate(),
-            state: State::Chunks,
-            filter,
-            remainder_len: chunks.remainder_len(),
-            chunk_len: chunks.chunk_len(),
-            remainder_mask: chunks.remainder_bits(),
-            len: 0,
-            start: 0,
-            on_region: false,
-            current_chunk: 0,
-            current_bit: 0,
+            iter,
+            len,
+            offset,
+            current_chunk,
         }
     }
 
-    /// Counts the number of set bits in the filter array.
-    fn filter_count(&self) -> usize {
-        let values = self.filter.values();
-        values.count_set_bits_offset(self.filter.offset(), self.filter.len())
-    }
-
-    #[inline]
-    fn current_start(&self) -> usize {
-        self.current_chunk * 64 + self.current_bit
-    }
-
-    #[inline]
-    fn iterate_bits(&mut self, mask: u64, max: usize) -> Option<(usize, usize)> {
-        while self.current_bit < max {
-            if (mask & (1 << self.current_bit)) != 0 {
-                if !self.on_region {
-                    self.start = self.current_start();
-                    self.on_region = true;
-                }
-                self.len += 1;
-            } else if self.on_region {
-                let result = (self.start, self.start + self.len);
-                self.len = 0;
-                self.on_region = false;
-                self.current_bit += 1;
-                return Some(result);
+    fn advance_to_set_bit(&mut self) -> Option<(usize, u32)> {
+        loop {
+            if self.current_chunk != 0 {
+                // Find the index of the first 1
+                let bit_pos = self.current_chunk.trailing_zeros();

Review comment:
       Tbh this is likely the "trick" that is yielding most of the filter speedup. x86 has a specific instruction for this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Unaligned bit chunk

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r790286234



##########
File path: parquet/src/arrow/record_reader/definition_levels.rs
##########
@@ -353,34 +353,27 @@ impl PackedDecoder {
 
 /// Counts the number of set bits in the provided range
 pub fn count_set_bits(bytes: &[u8], range: Range<usize>) -> usize {
-    let mut count = 0_usize;
-    let chunks = BitChunks::new(bytes, range.start, range.end - range.start);
-    chunks.iter().for_each(|chunk| {
-        count += chunk.count_ones() as usize;
-    });
-    count += chunks.remainder_bits().count_ones() as usize;
-    count
+    let unaligned = UnalignedBitChunk::new(bytes, range.start, range.end - range.start);
+    unaligned.count_ones()
 }
 
 fn iter_set_bits_rev(bytes: &[u8]) -> impl Iterator<Item = usize> + '_ {
-    let (mut byte_idx, mut in_progress) = match bytes.len() {
-        0 => (0, 0),
-        len => (len - 1, bytes[len - 1]),
-    };
-
-    std::iter::from_fn(move || loop {
-        if in_progress != 0 {
-            let bit_pos = 7 - in_progress.leading_zeros();
-            in_progress ^= 1 << bit_pos;
-            return Some((byte_idx << 3) + (bit_pos as usize));
-        }
-
-        if byte_idx == 0 {
+    let bit_length = bytes.len() * 8;

Review comment:
       This wasn't ever really a bottleneck in the parquet parsing, but still sees a slight aggregate improvement of 2-3%. Updated for curiosity more than necessity.
   
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#issuecomment-1028230786


   :+1: from my side


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r796687039



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -272,4 +462,149 @@ mod tests {
         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
         assert_eq!(0x7F, bitchunks.remainder_bits());
     }
+
+    #[test]
+    #[allow(clippy::assertions_on_constants)]
+    fn test_unaligned_bit_chunk_iterator() {
+        // This test exploits the fact Buffer is at least 64-byte aligned
+        assert!(ALIGNMENT > 64);
+
+        let buffer = Buffer::from(&[0xFF; 5]);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 40);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 40) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 24);
+
+        let buffer = buffer.slice(1);
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 0, 32);
+
+        assert_eq!(unaligned.prefix(), Some((1 << 32) - 1));
+        assert_eq!(unaligned.suffix(), None);
+        assert!(unaligned.chunks().is_empty());
+        assert_eq!(unaligned.lead_padding(), 0);
+        assert_eq!(unaligned.trailing_padding(), 32);
+
+        let unaligned = UnalignedBitChunk::new(buffer.as_slice(), 5, 27);
+
+        assert_eq!(unaligned.prefix(), Some(((1 << 32) - 1) - ((1 << 5) - 1)));

Review comment:
       Yeah I agree that UnalignedBitChunkIterator isn't the easiest construction to use, although this is sort of by design. It is not intended as a replacement for BitChunkIterator, but rather something for where you are willing to pay the cost of more complex start and termination logic, for simpler logic within the main loop itself...
   
   Would it allay your concerns if I made it pub(crate) so that we can continue to iterate on it without introducing breaking changes?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#discussion_r795041965



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -15,8 +17,192 @@
 // specific language governing permissions and limitations
 // under the License.
 use crate::util::bit_util::ceil;
-use std::fmt::Debug;
 
+/// Iterates over an arbitrarily aligned byte buffer
+///
+/// Yields an iterator of aligned u64, along with the leading and trailing
+/// u64 necessary to align the buffer to a 8-byte boundary
+///
+/// This is unlike [`BitChunkIterator`] which only exposes a trailing u64,

Review comment:
       👍  good rationale




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on pull request #1228: Faster bitmask iteration

Posted by GitBox <gi...@apache.org>.
tustvold commented on pull request #1228:
URL: https://github.com/apache/arrow-rs/pull/1228#issuecomment-1024895206


   I think SlicesIterator has a bug :disappointed: - fixing...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org