You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/08 00:45:15 UTC

[GitHub] [arrow] seddonm1 commented on a change in pull request #8860: ARROW-10783: [Rust] [DataFusion] Implement row count statistics for Parquet TableProviderParquet statistics [WIP]

seddonm1 commented on a change in pull request #8860:
URL: https://github.com/apache/arrow/pull/8860#discussion_r537793320



##########
File path: rust/datafusion/src/datasource/datasource.rs
##########
@@ -24,6 +24,15 @@ use crate::arrow::datatypes::SchemaRef;
 use crate::error::Result;
 use crate::physical_plan::ExecutionPlan;
 
+/// The table statistics
+#[derive(Clone, Debug)]
+pub struct Statistics {

Review comment:
       This is repeating what you have done in #8831

##########
File path: rust/datafusion/src/physical_plan/parquet.rs
##########
@@ -99,6 +100,38 @@ impl ParquetExec {
             batch_size,
         }
     }
+
+    /// Get the statistics from parquet file format
+    pub fn statistics(&self) -> Option<Statistics> {

Review comment:
       By doing this you are parsing the parquet file headers a second time when they have already been parsed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org