You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ashish Singh (Jira)" <ji...@apache.org> on 2021/07/21 19:12:00 UTC

[jira] [Commented] (SPARK-31162) Provide Configuration Parameter to select/enforce the Hive Hash for Bucketing

    [ https://issues.apache.org/jira/browse/SPARK-31162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385071#comment-17385071 ] 

Ashish Singh commented on SPARK-31162:
--------------------------------------

This is needed for reasons other than supporting hive bucketing write. For example, this is also needed to make sure custom partitioners from Hive (using Hive udf) can partition similar to hive.

Assigning it to myself, but let me know if you are working on this already [~maropu].

> Provide Configuration Parameter to select/enforce the Hive Hash for Bucketing
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-31162
>                 URL: https://issues.apache.org/jira/browse/SPARK-31162
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core, SQL
>    Affects Versions: 3.1.0
>            Reporter: Felix Kizhakkel Jose
>            Priority: Major
>
> I couldn't find a configuration parameter to choose Hive Hashing instead of Spark's default Murmur Hash when performing Spark BucketBy operation. According to the discussion with @[~maropu] [~hyukjin.kwon], suggested to open a new JIRA. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org