You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2018/02/27 07:39:00 UTC
[jira] [Resolved] (SPARK-23445) ColumnStat refactoring
[ https://issues.apache.org/jira/browse/SPARK-23445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li resolved SPARK-23445.
-----------------------------
Resolution: Fixed
Assignee: Juliusz Sompolski
Fix Version/s: 2.4.0
> ColumnStat refactoring
> ----------------------
>
> Key: SPARK-23445
> URL: https://issues.apache.org/jira/browse/SPARK-23445
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: Juliusz Sompolski
> Assignee: Juliusz Sompolski
> Priority: Major
> Fix For: 2.4.0
>
>
> Refactor ColumnStat to be more flexible.
> * Split {{ColumnStat}} and {{CatalogColumnStat}} just like {{CatalogStatistics}} is split from {{Statistics}}. This detaches how the statistics are stored from how they are processed in the query plan. {{CatalogColumnStat}} keeps {{min}} and {{max}} as {{String}}, making it not depend on dataType information.
> * For {{CatalogColumnStat}}, parse column names from property names in the metastore ({{KEY_VERSION }}property), not from metastore schema. This allows the catalog to read stats into {{CatalogColumnStat}}s even if the schema itself is not in the metastore.
> * Make all fields optional. {{min}}, {{max}} and {{histogram}} for columns were optional already. Having them all optional is more consistent, and gives flexibility to e.g. drop some of the fields through transformations if they are difficult / impossible to calculate.
> The added flexibility will make it possible to have alternative implementations for stats, and separates stats collection from stats and estimation processing in plans.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org