You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Venki Korukanti (JIRA)" <ji...@apache.org> on 2015/03/20 19:40:38 UTC

[jira] [Resolved] (DRILL-2402) Current method of combining hash values can produce skew

     [ https://issues.apache.org/jira/browse/DRILL-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Venki Korukanti resolved DRILL-2402.
------------------------------------
          Resolution: Fixed
       Fix Version/s:     (was: 0.9.0)
                      0.8.0
    Target Version/s: 0.8.0

Fixed in [bb1d761|https://github.com/apache/drill/commit/bb1d7615e7eb6c0c17c0c8a1cde0ca070393e257].

> Current method of combining hash values can produce skew
> --------------------------------------------------------
>
>                 Key: DRILL-2402
>                 URL: https://issues.apache.org/jira/browse/DRILL-2402
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Functions - Drill
>    Affects Versions: 0.8.0
>            Reporter: Aman Sinha
>            Assignee: Jacques Nadeau
>             Fix For: 0.8.0
>
>         Attachments: DRILL-2402-1.patch
>
>
> The current method of combining hash values of multiple columns can produce skew in some cases even though each individual hash function does not produce skew.  The combining function is XOR: 
> {code}
>    hash(a, b) = XOR (hash(a), hash(b))
> {code}
> The above result will be 0 for all  rows where a = b, so hash(a) = hash(b).  This will clearly create severe skew and affects the performance of queries that do HashAggregate based group-by on {a, b} or a HashJoin .on both columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)