You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/01/30 02:56:00 UTC
[jira] [Commented] (SPARK-23445) ColumnStat refactoring
[ https://issues.apache.org/jira/browse/SPARK-23445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484292#comment-17484292 ]
Apache Spark commented on SPARK-23445:
--------------------------------------
User 'Stove-hust' has created a pull request for this issue:
https://github.com/apache/spark/pull/35363
> ColumnStat refactoring
> ----------------------
>
> Key: SPARK-23445
> URL: https://issues.apache.org/jira/browse/SPARK-23445
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: Juliusz Sompolski
> Assignee: Juliusz Sompolski
> Priority: Major
> Fix For: 2.4.0
>
>
> Refactor ColumnStat to be more flexible.
> * Split {{ColumnStat}} and {{CatalogColumnStat}} just like {{CatalogStatistics}} is split from {{Statistics}}. This detaches how the statistics are stored from how they are processed in the query plan. {{CatalogColumnStat}} keeps {{min}} and {{max}} as {{String}}, making it not depend on dataType information.
> * For {{CatalogColumnStat}}, parse column names from property names in the metastore ({{KEY_VERSION }}property), not from metastore schema. This allows the catalog to read stats into {{CatalogColumnStat}}s even if the schema itself is not in the metastore.
> * Make all fields optional. {{min}}, {{max}} and {{histogram}} for columns were optional already. Having them all optional is more consistent, and gives flexibility to e.g. drop some of the fields through transformations if they are difficult / impossible to calculate.
> The added flexibility will make it possible to have alternative implementations for stats, and separates stats collection from stats and estimation processing in plans.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org