You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/12 15:02:29 UTC

[GitHub] [iceberg] aokolnychyi commented on pull request #2048: Add validation for metrics config during parquet write

aokolnychyi commented on pull request #2048:
URL: https://github.com/apache/iceberg/pull/2048#issuecomment-758714296


   @rdblue @shardulm94 @RussellSpitzer, not related to this PR but a general question: the metrics config relies on column names instead of column ids as it is controlled through table properties. I was wondering whether it is safe to do so.
   
   I considered the following cases:
   - While writing new data, it should be always OK to follow the name-based approach if the config is up-to-date.
   - While importing data (assuming we have a correct name mapping), we will assign correct ids to columns, use the ids to find the current aliases in the schema, so this should work.
   - When fixing (re-importing) metadata (assuming we have renamed a column through Iceberg and no name mapping), we will just use the column id in the file to find the current name, so this should work.
   
   Overall, it seems safe to me. Any cases I missed?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org