You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/06 01:03:10 UTC

[GitHub] [arrow-rs] yjshen commented on a change in pull request #1389: Introduce `ReadOptions` with builder API, filter row groups that satisfy all filters, and enable filter row groups by range.

yjshen commented on a change in pull request #1389:
URL: https://github.com/apache/arrow-rs/pull/1389#discussion_r820164755



##########
File path: parquet/src/file/serialized_reader.rs
##########
@@ -127,6 +127,56 @@ pub struct SerializedFileReader<R: ChunkReader> {
     metadata: ParquetMetaData,
 }
 
+/// A builder for [`ReadOptions`].
+/// For the predicates that are added to the builder,
+/// they will be chained using 'AND' to filter the row groups.
+pub struct ReadOptionsBuilder {
+    predicates: Vec<Box<dyn FnMut(&RowGroupMetaData, usize) -> bool>>,
+}
+
+impl ReadOptionsBuilder {
+    /// New builder
+    pub fn new() -> Self {
+        ReadOptionsBuilder { predicates: vec![] }
+    }
+
+    /// Add a predicate on row group metadata to the reading option,
+    /// Filter only row groups that match the predicate criteria
+    pub fn with_predicate(
+        mut self,
+        predicate: Box<dyn FnMut(&RowGroupMetaData, usize) -> bool>,
+    ) -> Self {
+        self.predicates.push(predicate);
+        self
+    }
+
+    /// Add a range predicate on filtering row groups if their midpoints are within the range

Review comment:
       Thanks, this is important. I've updated the doc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org