You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/18 17:21:08 UTC

[GitHub] [arrow-datafusion] lvheyang commented on a change in pull request #749: #723 Datafusion add option in ExecutionConfig to enable/disable parquet pruning

lvheyang commented on a change in pull request #749:
URL: https://github.com/apache/arrow-datafusion/pull/749#discussion_r671872318



##########
File path: datafusion/src/datasource/parquet.rs
##########
@@ -38,11 +38,22 @@ pub struct ParquetTable {
     schema: SchemaRef,
     statistics: Statistics,
     max_concurrency: usize,
+    enable_pruning: bool,
 }
 
 impl ParquetTable {
     /// Attempt to initialize a new `ParquetTable` from a file path.
     pub fn try_new(path: impl Into<String>, max_concurrency: usize) -> Result<Self> {
+        ParquetTable::try_new_with_pruning_config(path, max_concurrency, true)
+    }
+
+    /// Attempt to initialize a new `ParquetTable` from a file path. And enable or
+    /// disable the parquet pruning features.
+    pub fn try_new_with_pruning_config(

Review comment:
       Here I'm not sure if adding the function is a good choice. 
   
   My concern is, it is a public function, there may be many users who rely on it. But the `enable_pruning` in the signature is somehow temporal, we don't want it to last for a long time.
   
   So I have another thought, replace this function with `try_new_with_config(path: impl Into<String>,, execution_config: ExecutionConfig)`. I think it's a better option, but it will introduce the dependency of `execution::context` module which I think is the top-level module. It seems a little weird. 
   
   I'm not sure if the second method is acceptable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org