You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/03 21:38:16 UTC

[GitHub] [arrow-rs] sunchao commented on a change in pull request #1389: Filter row groups by comparing midpoint with offset range

sunchao commented on a change in pull request #1389:
URL: https://github.com/apache/arrow-rs/pull/1389#discussion_r819082487



##########
File path: parquet/src/file/serialized_reader.rs
##########
@@ -138,25 +138,51 @@ impl<R: 'static + ChunkReader> SerializedFileReader<R> {
         })
     }
 
-    /// Filters row group metadata to only those row groups,
-    /// for which the predicate function returns true
+    /// Filter row groups by metadata that match the predicate criteria and row group's midpoint
+    /// are within the `[start, end)` range (if the range is provided).
     pub fn filter_row_groups(
         &mut self,
         predicate: &dyn Fn(&RowGroupMetaData, usize) -> bool,
+        range: Option<(i64, i64)>,

Review comment:
       I'm thinking whether we can have something like [`ParquetReadOptions`](https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/ParquetReadOptions.java#L39) which we can gradually expand with other filter types in future. With that, we can initialize the reader with something like `new_with_options` or using builder pattern.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org