You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/09/26 21:37:00 UTC

[jira] [Created] (IMPALA-7636) Avoid storing hash in hash table bucket for hash tables in join

Tim Armstrong created IMPALA-7636:
-------------------------------------

             Summary: Avoid storing hash in hash table bucket for hash tables in join
                 Key: IMPALA-7636
                 URL: https://issues.apache.org/jira/browse/IMPALA-7636
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 3.1.0
            Reporter: Tim Armstrong


Somewhat related to IMPALA-7635, I think storing the precomputed hash in the hash table buckets is of questionable benefit for joins. It's useful for aggregations since we frequently resize the hash tables, but in joins it's only used to short-circuit calling Equal(), which often isn't that expensive. It's unclear how many calls to Equal() are actually avoided. We should do some benchmarks to determine . As a sanity check for the idea, we could remove the (hash == bucket->hash) check in Probe() and see if performance is affected.

The difficult part here is figuring out how to share the HashTable code between the agg and join but having different bucket representations - templates?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org