You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Michael Ho (JIRA)" <ji...@apache.org> on 2018/03/30 22:29:00 UTC

[jira] [Created] (IMPALA-6781) Stress tests should take into account of non-deterministic results of queries

Michael Ho created IMPALA-6781:
----------------------------------

             Summary: Stress tests should take into account of non-deterministic results of queries
                 Key: IMPALA-6781
                 URL: https://issues.apache.org/jira/browse/IMPALA-6781
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
    Affects Versions: Impala 3.0, Impala 2.12.0
            Reporter: Michael Ho
            Assignee: Michael Brown


Certain queries have the pattern of ordery-by with a limit clause. Due to the limit clause, we may not include all the rows for a given order-by keys in the output. It's non-deterministic that which rows will be included in the output. Given the stress test will store the hashes of the query result obtained during binary search, we may sometimes get false alarm on result mis-match. One way to fix it is that we need to order all the columns so that the results will always be deterministic.

For instance, the following TPC-H queries may have multiple correct results:
{noformat}
    Default Db: tpch_10000_parquet
    Sql Statement: /* Mem: 8152 MB. Coordinator: vc1503.halxg.cloudera.com. */
select
  l_orderkey,
  sum(l_extendedprice * (1 - l_discount)) as revenue,
  o_orderdate,
  o_shippriority
from
  customer,
  orders,
  lineitem
where
  c_mktsegment = 'BUILDING'
  and c_custkey = o_custkey
  and l_orderkey = o_orderkey
  and o_orderdate < '1995-03-15'
  and l_shipdate > '1995-03-15'
group by
  l_orderkey,
  o_orderdate,
  o_shippriority
order by
  revenue desc,
  o_orderdate
limit 10
{noformat}
Correct result 1:
{noformat}
+-------------+-------------+-------------+----------------+
| l_orderkey  | revenue     | o_orderdate | o_shippriority |
+-------------+-------------+-------------+----------------+
| 18134335556 | 510414.0360 | 1995-02-13  | 0              |
| 52494073892 | 510414.0360 | 1995-02-13  | 0              |
| 26724270146 | 510414.0360 | 1995-02-13  | 0              |
| 954466400   | 510414.0360 | 1995-02-13  | 0              |
| 9544400966  | 510414.0360 | 1995-02-13  | 0              |
| 35314204736 | 510414.0360 | 1995-02-13  | 0              |
| 43904139302 | 510414.0360 | 1995-02-13  | 0              |
| 7133834945  | 503732.3318 | 1995-02-25  | 0              |
| 24313704101 | 503732.3318 | 1995-02-25  | 0              |
| 58673442437 | 503732.3318 | 1995-02-25  | 0              |
+-------------+-------------+-------------+----------------+
{noformat}
Correct result 2:
{noformat}
+-------------+-------------+-------------+----------------+
| l_orderkey  | revenue     | o_orderdate | o_shippriority |
+-------------+-------------+-------------+----------------+
| 26724270146 | 510414.0360 | 1995-02-13  | 0              |
| 43904139302 | 510414.0360 | 1995-02-13  | 0              |
| 35314204736 | 510414.0360 | 1995-02-13  | 0              |
| 52494073892 | 510414.0360 | 1995-02-13  | 0              |
| 18134335556 | 510414.0360 | 1995-02-13  | 0              |
| 9544400966  | 510414.0360 | 1995-02-13  | 0              |
| 954466400   | 510414.0360 | 1995-02-13  | 0              |
| 32903638691 | 503732.3318 | 1995-02-25  | 0              |
| 15723769511 | 503732.3318 | 1995-02-25  | 0              |
| 50083507847 | 503732.3318 | 1995-02-25  | 0              |
+-------------+-------------+-------------+----------------+
{noformat}
If one changes the limit clause to 15, we will see that the last 3 rows should be a subset of rows with (503732.3318, 1995-02-25) as the ordering keys {{revenue}} and {{o_orderdate}} are identical so it's non-deterministic which rows are shown. The result will be deterministic if the order-by clause also include {{l_orderkey}} and {{o_shippriority}}
{noformat}
+-------------+-------------+-------------+----------------+
| l_orderkey  | revenue     | o_orderdate | o_shippriority |
+-------------+-------------+-------------+----------------+
| 26724270146 | 510414.0360 | 1995-02-13  | 0              |
| 43904139302 | 510414.0360 | 1995-02-13  | 0              |
| 35314204736 | 510414.0360 | 1995-02-13  | 0              |
| 52494073892 | 510414.0360 | 1995-02-13  | 0              |
| 18134335556 | 510414.0360 | 1995-02-13  | 0              |
| 9544400966  | 510414.0360 | 1995-02-13  | 0              |
| 954466400   | 510414.0360 | 1995-02-13  | 0              |
| 32903638691 | 503732.3318 | 1995-02-25  | 0              |
| 15723769511 | 503732.3318 | 1995-02-25  | 0              |
| 50083507847 | 503732.3318 | 1995-02-25  | 0              |
| 7133834945  | 503732.3318 | 1995-02-25  | 0              |
| 41493573281 | 503732.3318 | 1995-02-25  | 0              |
| 24313704101 | 503732.3318 | 1995-02-25  | 0              |
| 58673442437 | 503732.3318 | 1995-02-25  | 0              |
| 54409232481 | 499104.0645 | 1995-03-10  | 0              |
+-------------+-------------+-------------+----------------+
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)