You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Riza Suminto (Jira)" <ji...@apache.org> on 2020/03/19 19:34:00 UTC

[jira] [Commented] (IMPALA-9434) Implement / Evaluate Robin Hood Hashing for exec/hash-table.h

    [ https://issues.apache.org/jira/browse/IMPALA-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062887#comment-17062887 ] 

Riza Suminto commented on IMPALA-9434:
--------------------------------------

Hi [~joemcdonnell], I'd like to try work on this JIRA.

My idea right now is to keep insertion as it is. After successful insert, we rebalance the table, starting from recently inserted index, wether there is a better (richer) bucket that we can swap it with. We swap if it is beneficial, and redo the evaluation over the evicted element. We continue to do so until we find empty bucket to swap with or current index is already the best.

> Implement / Evaluate Robin Hood Hashing for exec/hash-table.h
> -------------------------------------------------------------
>
>                 Key: IMPALA-9434
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9434
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.4.0
>            Reporter: Joe McDonnell
>            Priority: Minor
>              Labels: ramp-up
>
> exec/hash-table.h's HashTable currently implements an open addressing hash table with linear and quadratic probing. Since there are few guarantees about the probe length, we limit the load factor to 0.75 to avoid high probe lengths. In general, probe lengths increase severely as the load factor goes higher than that.
> Robin Hood Hashing reduces the variances of probe lengths by continually rebalancing elements. It steals from the "rich" elements that have a very short probe length and gives to the "poor" elements with a long probe length. This tighter guarantee about the probe length allows for a higher load factor (0.9).
> We should evaluate Robin Hood Hashing to see if it improves our hash table performance. Since this hash table does not support deletes, that reduces the complexity. The Bucket already has some unused bytes in case we need them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org