You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2016/05/12 00:56:32 UTC

[Impala-CR](cdh5-trunk) IMPALA-3495: incorrect join result due to implicit cast in Murmur hash

Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3034

Change subject: IMPALA-3495: incorrect join result due to implicit cast in Murmur hash
......................................................................

IMPALA-3495: incorrect join result due to implicit cast in Murmur hash

We observed that some spilling joins started returning incorrect
results. The behaviour seems to happen when a codegen'd insert and a
non-codegen'd probe function is used (or vice-versa). This only seems to
happen in a subset of cases.

The bug appears to be a result of the implicit cast of the uint32_t seed
value to the int32_t hash argument to HashTable::Hash(). The behaviour
is unspecified if the uint32_t does not fit in the int32_t. In Murmur
hash, this value is subsequently cast to a uint64_t, so we have a chain
of uint32_t->int32_t->uint64_t conversions. It would require a very
careful reading of the C++ standard to understand what the expected
result is, and whether we're seeing a compiler bug or just unspecified
behaviour, but we can avoid it entirely by keeping the values unsigned.

Testing:
I was able to reproduce the issue under a very specific of circumstances,
listed below. Before this change it consistently returned 0 rows. After the
change it consistently returned the correct results. I haven't had much
luck creating a suitable regression test.

* 1 impalad
* --disable_mem_pools=true
* use tpch_20_parquet;
* set mem_limit=1275mb;
* TPC-H query 7:

select
  supp_nation,
  cust_nation,
  l_year,
  sum(volume) as revenue
from (
  select
    n1.n_name as supp_nation,
    n2.n_name as cust_nation,
    year(l_shipdate) as l_year,
    l_extendedprice * (1 - l_discount) as volume
  from
    supplier,
    lineitem,
    orders,
    customer,
    nation n1,
    nation n2
  where
    s_suppkey = l_suppkey
    and o_orderkey = l_orderkey
    and c_custkey = o_custkey
    and s_nationkey = n1.n_nationkey
    and c_nationkey = n2.n_nationkey
    and (
      (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
      or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
    )
    and l_shipdate between '1995-01-01' and '1996-12-31'
  ) as shipping
group by
  supp_nation,
  cust_nation,
  l_year
order by
  supp_nation,
  cust_nation,
  l_year

Change-Id: I952638dc94119a4bc93126ea94cc6a3edf438956
---
M be/src/exec/hash-table.h
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/34/3034/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I952638dc94119a4bc93126ea94cc6a3edf438956
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>