You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/09 13:39:20 UTC

[GitHub] [iceberg] bryanck commented on pull request #5215: Core: Update MetricsConfig to use a default for first 32 columns

bryanck commented on PR #5215:
URL: https://github.com/apache/iceberg/pull/5215#issuecomment-1179546876

   This is already merged, but I thought I'd leave feedback anyway, in case it is useful.
   
   As a data engineer, many tables I have maintained have more than 32 top-level columns. Often columns used for partitioning, sorting, auditing, and so forth are put at the end of a table schema, but these are some of the most frequently used in filtering. Also, additional columns are generally added at the end of the schema. The assumption that the first columns in a table schema are the most important to have stats on is not always accurate. 
   
   In testing 0.14, I ran into missing stats on tables, which was confusing and difficult to debug. I image those new to Iceberg and who are most likely to leave settings at the default, it would be even more confusing.
   
   I feel a more sensible default is to leave it the same as previous Iceberg versions (i.e. no column limit). Then an option could be introduced to limit the number of columns so those that prefer can set it on their tables, e.g. "first(32)". I feel it is better to err on the side of too many stats and dial that back as needed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org