You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/07 04:59:48 UTC

[GitHub] [iceberg] zhongyujiang commented on pull request #6118: Parquet, Core: Fix collection of Parquet metrics when column names co…

zhongyujiang commented on PR #6118:
URL: https://github.com/apache/iceberg/pull/6118#issuecomment-1305081026

   @rdblue sure.
   When collecting metrics from Parquet footer, Iceberg [converts](https://github.com/apache/iceberg/blob/167a8ccd7c578296c40f8fc61c90135e71cf1183/parquet/src/main/java/org/apache/iceberg/parquet/ParquetUtil.java#L107) the file MessageType to an Iceberg Schema and [uses](https://github.com/apache/iceberg/blob/167a8ccd7c578296c40f8fc61c90135e71cf1183/core/src/main/java/org/apache/iceberg/MetricsUtil.java#L56) this schema to get the column name of an field id it mapping, and then uses the obtained field name to get its corresponding metric mode. 
   
   However Iceberg will escape special characters in field names when converting an Iceberg Schema to an Parquet MessageType, and those escaped names cannot be restored when converting an Parquet MessageType back to an Iceberg Schema, that is to say, we are now using those escaped column names to get their corresponding metric modes, which may resulted in incorrect results since those escaped names cannot be recognized by MetricsConfig. 
   
   The ORC path does not have this problem because special characters are not escaped when converting to ORC schema and ORC [itself](https://lists.apache.org/thread/93xbnbs0mr0zxx4fzvrz10m5mmd4qb5w) can handle any UTF-8 characters in the column names.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org