You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:24:06 UTC

[jira] [Updated] (SPARK-16481) Spark does not update statistics when making use of Hive partitions

     [ https://issues.apache.org/jira/browse/SPARK-16481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-16481:
---------------------------------
    Labels: bulk-closed  (was: )

> Spark does not update statistics when making use of Hive partitions
> -------------------------------------------------------------------
>
>                 Key: SPARK-16481
>                 URL: https://issues.apache.org/jira/browse/SPARK-16481
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Fokko Driesprong
>            Priority: Major
>              Labels: bulk-closed
>
> Hi all,
> I've had some strange behaviour using Hive partitions. Turned out, when using Hive partitions, the statistics of the Parquet get not updated properly when inserting new data. I've isolated the issue in the following case:
> https://github.com/Fokko/spark-strange-refresh-behaviour
> The fix right now is to refresh the data by hand, which is quite error prone as it can be easily forgotten.
> Cheers, Fokko Driesprong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org