You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2018/02/27 07:39:00 UTC

[jira] [Resolved] (SPARK-23445) ColumnStat refactoring

     [ https://issues.apache.org/jira/browse/SPARK-23445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiao Li resolved SPARK-23445.
-----------------------------
       Resolution: Fixed
         Assignee: Juliusz Sompolski
    Fix Version/s: 2.4.0

> ColumnStat refactoring
> ----------------------
>
>                 Key: SPARK-23445
>                 URL: https://issues.apache.org/jira/browse/SPARK-23445
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Juliusz Sompolski
>            Assignee: Juliusz Sompolski
>            Priority: Major
>             Fix For: 2.4.0
>
>
> Refactor ColumnStat to be more flexible.
>  * Split {{ColumnStat}} and {{CatalogColumnStat}} just like {{CatalogStatistics}} is split from {{Statistics}}. This detaches how the statistics are stored from how they are processed in the query plan. {{CatalogColumnStat}} keeps {{min}} and {{max}} as {{String}}, making it not depend on dataType information.
>  * For {{CatalogColumnStat}}, parse column names from property names in the metastore ({{KEY_VERSION }}property), not from metastore schema. This allows the catalog to read stats into {{CatalogColumnStat}}s even if the schema itself is not in the metastore.
>  * Make all fields optional. {{min}}, {{max}} and {{histogram}} for columns were optional already. Having them all optional is more consistent, and gives flexibility to e.g. drop some of the fields through transformations if they are difficult / impossible to calculate.
> The added flexibility will make it possible to have alternative implementations for stats, and separates stats collection from stats and estimation processing in plans.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org