You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/18 07:28:22 UTC

[GitHub] [arrow-datafusion] houqp commented on a change in pull request #897: Refactor ParquetExec in preparation for implementing parallel scans for statistics

houqp commented on a change in pull request #897:
URL: https://github.com/apache/arrow-datafusion/pull/897#discussion_r690974538



##########
File path: datafusion/src/physical_plan/parquet.rs
##########
@@ -582,14 +389,244 @@ impl ParquetExec {
 
 impl ParquetPartition {
     /// Create a new parquet partition
-    pub fn new(filenames: Vec<String>, statistics: Statistics) -> Self {
+    pub fn new(
+        filenames: Vec<String>,

Review comment:
       It seems like the `PartitionedFile` abstraction proposed by @yjshen  in https://github.com/apache/arrow-datafusion/pull/811/files#diff-72f3a52c56e83e00d8c605d461f092617a3c205619376bb373069c662f9cfc93R54 would help solve this problem?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org