You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Takeshi Yamamuro (Jira)" <ji...@apache.org> on 2020/03/16 01:08:00 UTC

[jira] [Updated] (SPARK-17495) Hive hash implementation

     [ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Takeshi Yamamuro updated SPARK-17495:
-------------------------------------
    Affects Version/s: 2.2.0

> Hive hash implementation
> ------------------------
>
>                 Key: SPARK-17495
>                 URL: https://issues.apache.org/jira/browse/SPARK-17495
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Tejas Patil
>            Assignee: Tejas Patil
>            Priority: Minor
>              Labels: bulk-closed
>
> Spark internally uses Murmur3Hash for partitioning. This is different from the one used by Hive. For queries which use bucketing this leads to different results if one tries the same query on both engines. For us, we want users to have backward compatibility to that one can switch parts of applications across the engines without observing regressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org