You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/13 14:45:07 UTC

[GitHub] [arrow-rs] tustvold opened a new pull request, #1865: Expose BitSliceIterator and BitIndexIterator (#1864)

tustvold opened a new pull request, #1865:
URL: https://github.com/apache/arrow-rs/pull/1865

   # Which issue does this PR close?
   
   Part of #1864
   
   # Rationale for this change
   
   We have heavily optimised code for iterating bitmasks, we should use it in more places
   
   # What changes are included in this PR?
   
   Extracts the core logic of filter::SlicesIterator and filter::IndexIterator into a public module where they can be used by other code
   
   # Are there any user-facing changes?
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #1865: Expose BitSliceIterator and BitIndexIterator (#1864)

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #1865:
URL: https://github.com/apache/arrow-rs/pull/1865#discussion_r897968381


##########
arrow/src/compute/kernels/filter.rs:
##########
@@ -165,47 +97,28 @@ impl<'a> Iterator for SlicesIterator<'a> {
 /// This provides the best performance on most predicates, apart from those which keep
 /// large runs and therefore favour [`SlicesIterator`]
 struct IndexIterator<'a> {
-    current_chunk: u64,
-    chunk_offset: i64,
     remaining: usize,
-    iter: UnalignedBitChunkIterator<'a>,
+    iter: BitIndexIterator<'a>,
 }
 
 impl<'a> IndexIterator<'a> {
-    fn new(filter: &'a BooleanArray, len: usize) -> Self {
+    fn new(filter: &'a BooleanArray, remaining: usize) -> Self {
         assert_eq!(filter.null_count(), 0);
         let data = filter.data();
-        let chunks =
-            UnalignedBitChunk::new(&data.buffers()[0], data.offset(), data.len());
-        let mut iter = chunks.iter();
-
-        let current_chunk = iter.next().unwrap_or(0);
-        let chunk_offset = -(chunks.lead_padding() as i64);
-
-        Self {
-            current_chunk,
-            chunk_offset,
-            remaining: len,
-            iter,
-        }
+        let iter = BitIndexIterator::new(&data.buffers()[0], data.offset(), data.len());
+        Self { remaining, iter }
     }
 }
 
 impl<'a> Iterator for IndexIterator<'a> {
     type Item = usize;
 
     fn next(&mut self) -> Option<Self::Item> {
-        while self.remaining != 0 {
-            if self.current_chunk != 0 {
-                let bit_pos = self.current_chunk.trailing_zeros();
-                self.current_chunk ^= 1 << bit_pos;
-                self.remaining -= 1;
-                return Some((self.chunk_offset + bit_pos as i64) as usize);
-            }
-
+        if self.remaining != 0 {
+            let next = self.iter.next().expect("IndexIterator exhausted early");

Review Comment:
   ```suggestion
               // Fascinatingly swapping these two lines around results in a 50% 
               // performance regression for some benchmarks
               let next = self.iter.next().expect("IndexIterator exhausted early");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #1865: Expose `BitSliceIterator` and `BitIndexIterator` (#1864)

Posted by GitBox <gi...@apache.org>.
tustvold merged PR #1865:
URL: https://github.com/apache/arrow-rs/pull/1865


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] codecov-commenter commented on pull request #1865: Expose BitSliceIterator and BitIndexIterator (#1864)

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #1865:
URL: https://github.com/apache/arrow-rs/pull/1865#issuecomment-1154480873

   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1865?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1865](https://codecov.io/gh/apache/arrow-rs/pull/1865?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (8ad92b9) into [master](https://codecov.io/gh/apache/arrow-rs/commit/3073a26bc1f28f799e1034306e8d6df40364ec69?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3073a26) will **increase** coverage by `0.00%`.
   > The diff coverage is `98.14%`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #1865   +/-   ##
   =======================================
     Coverage   83.48%   83.48%           
   =======================================
     Files         201      202    +1     
     Lines       57000    57010   +10     
   =======================================
   + Hits        47584    47595   +11     
   + Misses       9416     9415    -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1865?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/util/bit\_chunk\_iterator.rs](https://codecov.io/gh/apache/arrow-rs/pull/1865/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL3V0aWwvYml0X2NodW5rX2l0ZXJhdG9yLnJz) | `93.83% <ø> (ø)` | |
   | [arrow/src/util/bit\_iterator.rs](https://codecov.io/gh/apache/arrow-rs/pull/1865/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL3V0aWwvYml0X2l0ZXJhdG9yLnJz) | `97.77% <97.77%> (ø)` | |
   | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/1865/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `88.05% <100.00%> (-0.28%)` | :arrow_down: |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1865/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `65.42% <0.00%> (-0.38%)` | :arrow_down: |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1865/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `65.98% <0.00%> (+0.22%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1865?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1865?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [3073a26...8ad92b9](https://codecov.io/gh/apache/arrow-rs/pull/1865?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1865: Expose BitSliceIterator and BitIndexIterator (#1864)

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #1865:
URL: https://github.com/apache/arrow-rs/pull/1865#discussion_r895810538


##########
arrow/src/compute/kernels/filter.rs:
##########
@@ -165,47 +97,28 @@ impl<'a> Iterator for SlicesIterator<'a> {
 /// This provides the best performance on most predicates, apart from those which keep
 /// large runs and therefore favour [`SlicesIterator`]
 struct IndexIterator<'a> {
-    current_chunk: u64,
-    chunk_offset: i64,
     remaining: usize,
-    iter: UnalignedBitChunkIterator<'a>,
+    iter: BitIndexIterator<'a>,
 }
 
 impl<'a> IndexIterator<'a> {
-    fn new(filter: &'a BooleanArray, len: usize) -> Self {
+    fn new(filter: &'a BooleanArray, remaining: usize) -> Self {
         assert_eq!(filter.null_count(), 0);
         let data = filter.data();
-        let chunks =
-            UnalignedBitChunk::new(&data.buffers()[0], data.offset(), data.len());
-        let mut iter = chunks.iter();
-
-        let current_chunk = iter.next().unwrap_or(0);
-        let chunk_offset = -(chunks.lead_padding() as i64);
-
-        Self {
-            current_chunk,
-            chunk_offset,
-            remaining: len,
-            iter,
-        }
+        let iter = BitIndexIterator::new(&data.buffers()[0], data.offset(), data.len());
+        Self { remaining, iter }
     }
 }
 
 impl<'a> Iterator for IndexIterator<'a> {
     type Item = usize;
 
     fn next(&mut self) -> Option<Self::Item> {
-        while self.remaining != 0 {
-            if self.current_chunk != 0 {
-                let bit_pos = self.current_chunk.trailing_zeros();
-                self.current_chunk ^= 1 << bit_pos;
-                self.remaining -= 1;
-                return Some((self.chunk_offset + bit_pos as i64) as usize);
-            }
-
+        if self.remaining != 0 {
+            let next = self.iter.next().expect("IndexIterator exhausted early");
+            self.remaining -= 1;

Review Comment:
   Fascinatingly swapping these two lines around results in a 50% performance regression for some benchmarks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org