You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "viirya (via GitHub)" <gi...@apache.org> on 2023/03/12 22:44:16 UTC

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3848: Add offset pushdown to parquet

viirya commented on code in PR #3848:
URL: https://github.com/apache/arrow-rs/pull/3848#discussion_r1133335614


##########
parquet/src/arrow/arrow_reader/selection.rs:
##########
@@ -372,29 +371,63 @@ impl RowSelection {
         self
     }
 
+    /// Applies an offset to this [`RowSelection`], skipping the first `offset` selected rows
+    pub(crate) fn offset(mut self, offset: usize) -> Self {
+        if offset == 0 {
+            return self;
+        }
+
+        let mut selected_count = 0;
+        let mut skipped_count = 0;
+
+        // Find the index where the selector exceeds the row count
+        let find = self
+            .selectors
+            .iter()
+            .position(|selector| match selector.skip {
+                true => {
+                    skipped_count += selector.row_count;
+                    false
+                }
+                false => {
+                    selected_count += selector.row_count;
+                    selected_count > offset
+                }
+            });
+
+        let split_idx = match find {
+            Some(idx) => idx,
+            None => {
+                self.selectors.clear();
+                return self;
+            }
+        };
+
+        let mut selectors = Vec::with_capacity(self.selectors.len() - split_idx + 1);
+        selectors.push(RowSelector::skip(skipped_count + offset));
+        selectors.push(RowSelector::select(selected_count - offset));

Review Comment:
   `skip` and `select` cannot be interleaved? i.e. all `skip` selections are in front of `select` selection?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org