You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/01 10:00:25 UTC

[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2671: If statistics of column Max/Min value does not exists in parquet file…

Ted-Jiang commented on code in PR #2671:
URL: https://github.com/apache/arrow-datafusion/pull/2671#discussion_r886621951


##########
datafusion/core/src/datasource/file_format/parquet.rs:
##########
@@ -344,6 +362,10 @@ fn fetch_statistics(
                             table_idx,
                             stats,
                         )
+                    } else {
+                        // If none statistics of current column exists, set the Max/Min Accumulator to None.

Review Comment:
   https://github.com/apache/arrow-datafusion/blob/807b7a5f7eb858e9f7162e1f00ffeeedd0bf2050/datafusion/core/src/datasource/file_format/parquet.rs#L312
   Here both `max_values` and `min_values ` are accumulator for all row_groups
   
   Set  `max_values[i] = None` in loop `for row_group_meta in meta_data.row_groups() {}` assume that  we will no longer need this col static. So we need a short cut (No need read next row_group metadata for this col avoid set another value) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org