You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/08/18 05:38:00 UTC

[jira] [Resolved] (IMPALA-5452) Nested subplans with non-trivial plan tree returns inconsistent results

     [ https://issues.apache.org/jira/browse/IMPALA-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Behm resolved IMPALA-5452.
------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0

commit 79fba2768768107e408644e0caea15f60b5f3354
Author: Alex Behm <al...@cloudera.com>
Date:   Wed Aug 16 17:25:06 2017 -0700

    IMPALA-5452: Rewrite test case to avoid 'pos'.
    
    The original test case accessed the 'pos' field of nested
    collections. The query results could vary when reloading
    the data because the order of items within a nested
    collection is not necessarily the same accross loads.
    
    This patch reformulates the test to avoid 'pos'.
    
    Change-Id: I32e47f0845da8b27652faaceae834e025ecff42a
    Reviewed-on: http://gerrit.cloudera.org:8080/7708
    Reviewed-by: Tim Armstrong <ta...@cloudera.com>
    Tested-by: Impala Public Jenkins


> Nested subplans with non-trivial plan tree returns inconsistent results
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-5452
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5452
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Anuj Phadke
>            Assignee: Alexander Behm
>            Priority: Critical
>              Labels: correctness
>             Fix For: Impala 2.10.0
>
>
> This particular query returned 18 rows consistently on running tests locally.
> {code}
> Query: select count(*)
> from tpch_nested_parquet.customer c
> inner join c.c_orders o
> where c_custkey < 10 AND o.pos in
>  (select lead(l.l_linenumber) over (order by l.pos)
>   from o.o_lineitems l)
> Query submitted at: 2017-06-07 11:20:39 (Coordinator: http://anuj-OptiPlex-9020:25000)
> Query progress can be monitored at: http://anuj-OptiPlex-9020:25000/query_plan?query_id=a14392f4fe88572a:9cad5eb200000000
> +----------+
> | count(*) |
> +----------+
> | 18       |
> +----------+
> {code}
> But here -
> http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/696/console
> The same query returned 19 rows.
> {code}
> 08:25:10 ] -- executing against localhost:21000
> 08:25:10 ] select count(*)
> 08:25:10 ] from tpch_nested_parquet.customer c
> 08:25:10 ] inner join c.c_orders o
> 08:25:10 ] where c_custkey < 10 AND o.pos in
> 08:25:10 ]   (select lead(l.l_linenumber) over (order by l.pos)
> 08:25:10 ]    from o.o_lineitems l);
> 08:25:10 ] 
> 08:25:10 ] MainThread: Comparing QueryTestResults (expected vs actual):
> 08:25:10 ] 18 != 19
> {code}
> This test was introduced as part of this commit -
> https://gerrit.cloudera.org/#/c/916/4/testdata/workloads/functional-query/queries/QueryTest/subplans.test
> As per this test, the query should have returned 14 rows. These tests have not been running for a while and we failed to catch this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)