You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/07 06:14:39 UTC

[GitHub] [iceberg] zhongyujiang commented on a diff in pull request #6118: Parquet, Core: Fix collection of Parquet metrics when column names co…

zhongyujiang commented on code in PR #6118:
URL: https://github.com/apache/iceberg/pull/6118#discussion_r1015035815


##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetUtil.java:
##########
@@ -75,23 +76,27 @@ public static Metrics fileMetrics(InputFile file, MetricsConfig metricsConfig) {
   public static Metrics fileMetrics(
       InputFile file, MetricsConfig metricsConfig, NameMapping nameMapping) {
     try (ParquetFileReader reader = ParquetFileReader.open(ParquetIO.file(file))) {
-      return footerMetrics(reader.getFooter(), Stream.empty(), metricsConfig, nameMapping);
+      return footerMetrics(reader.getFooter(), Stream.empty(), metricsConfig, nameMapping, null);

Review Comment:
   This is used to migrate external files to Iceberg tables,  I am thinking if those files are generated by Iceberg, we may still not be able to collect metrics according to the correct metric mode with current change. 
   
   To solve this , I think we can change TableMigrationUtil to pass Iceberg schema also; 
   Or we can update MetricsConfig to make it tracks mapping between column ids metric modes, and use column ids to get metric modes directly when collecting parquet metrics, but that will need all writers to update from `MetricsConfig.fromProperties(Map<String, String> props)` to `MetricsConfig from(Map<String, String> props, Schema schema, SortOrder order)` (this is a private method now). 
   
   I prefer the second way and I can update the releated code in this repo, but I don't know if any external projects uses `MetricsConfig.fromProperties(Map<String, String> props)` too. What do you think about it? @rdblue 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org