You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/05/02 04:38:02 UTC

[GitHub] [incubator-iceberg] aokolnychyi opened a new issue #173: Make collection of lower/upper bounds configurable

aokolnychyi opened a new issue #173: Make collection of lower/upper bounds configurable
URL: https://github.com/apache/incubator-iceberg/issues/173
 
 
   Lower/upper bounds might increase the size of metadata for datasets with a big number of columns, which can degrade the performance of metadata operations. In most cases, users cluster/sort their data by a subset of data columns to have fast queries with predicates on those columns. In my view, it is reasonable to have a list of columns for which we want to collect stats and that should be configurable by the user.
   
   Another option here is to add a flag to keep stats only for top-level columns. I believe the first approach is more flexible and will help us to properly support nested datasets. Together with upcoming changes for nested data in Spark, this will be really beneficial.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org