You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxim Gekk (JIRA)" <ji...@apache.org> on 2019/01/24 22:14:00 UTC

[jira] [Commented] (SPARK-26654) Use Timestamp/DateFormatter in CatalogColumnStat

    [ https://issues.apache.org/jira/browse/SPARK-26654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751634#comment-16751634 ] 

Maxim Gekk commented on SPARK-26654:
------------------------------------

[~cloud_fan][~hvanhovell][~srowen] I do believe saving statistics for TimestampType columns without time zone can cause inaccurate results if the statistics are read back in spark session with different time zone. So, it can impact on planning badly. This can be fixed by adding time zone during serialization of TimestampType column but it will change timestamp format (and old versions of Spark cannot read back if the versions will be not changed) or store original timezone separately together with statistics somewhere.

> Use Timestamp/DateFormatter in CatalogColumnStat
> ------------------------------------------------
>
>                 Key: SPARK-26654
>                 URL: https://issues.apache.org/jira/browse/SPARK-26654
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Maxim Gekk
>            Priority: Major
>
> Need to switch fromExternalString on Timestamp/DateFormatters, in particular:
> https://github.com/apache/spark/blob/3b7395fe025a4c9a591835e53ac6ca05be6868f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L481-L482



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org