You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jingsong Lee (Jira)" <ji...@apache.org> on 2020/01/03 06:00:00 UTC

[jira] [Updated] (FLINK-11964) Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL

     [ https://issues.apache.org/jira/browse/FLINK-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingsong Lee updated FLINK-11964:
---------------------------------
    Fix Version/s: 1.10.0

> Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-11964
>                 URL: https://issues.apache.org/jira/browse/FLINK-11964
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Task
>            Reporter: Jingsong Lee
>            Priority: Major
>             Fix For: 1.10.0
>
>
> In HybridHashTable, first select the corresponding partition according to hashCode, and then select the bucket in the partition according to hashCode, using the same hashCode can easily cause hash collision.
> Consider doing some mix to hashCode when choosing bucket.
> Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ (hash >>> 16))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)