You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2019/04/05 17:06:58 UTC

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12939


Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................

IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

When generating single node plans for inline views, Impala will create
some equivalence conjuncts based on slot equivalences. However, these
conjuncts may be identity (e.g. a = a) which may incorrectly reject rows
with nulls. We already have some logic to remove this kind of conjuncts
but the existing checks have exceptions.

For example, consider the following tables and a query:
table A        table B            table C
+------+  +------+--------+  +------+------+
| a_id |  | b_id | amount |  | a_id | b_id |
+------+  +------+--------+  +------+------+
| 1    |  | 1    | 10     |  | 1    | 1    |
| 2    |  | 1    | 20     |  | 2    | 2    |
+------+  | 2    | NULL   |  +------+------+
          +------+--------+
    select * from (select t2.a_id, t2.amount1, t2.amount2
        from a
        left outer join (
            select c.a_id, amount as amount1, amount as amount2
            from b join c on b.b_id = c.b_id
        ) t2
        on a.a_id = t2.a_id
    ) t1;

They query has 11 slots. The valueTransferGraph (slot equivalences) has
3 strongly connected components:
 * {0: b.b_id, 1: c.b_id}
 * {2: c.a_id, 4: t2.a_id, 8: t1.a_id}
 * {3: b.amount, 5: t2.amount1, 6: t2.amount2, 9: t1.amount1,
  10: t1.amount2}

In SingleNodePlanner#migrateConjunctsToInlineView, when dealing with
inline view t1, a predicate "t1.amount1 = t1.amount2" will first be
created by Analyzer#createEquivConjuncts, then be substituted using the
smap_ of the inline view and become "t2.amount1 = t2.amount2". It can
still pass the IdentityPredicate check.

Actually, when checking IdentityPredicates, we need to resolved them
using base table slots (baseTblSmap_). So the predicate "t1.amount1 =
 t1.amount2" will be resolved to "amount = amount" and won't pass the
the IdentityPredicate check.

Tests:
 * Add plan tests in PlannerTest/inline-view.test

Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
3 files changed, 108 insertions(+), 16 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/12939/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2660/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Apr 2019 17:52:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12939/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12939/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1190
PS4, Line 1190:           Expr expr = e.trySubstitute(inlineViewRef.getBaseTblSmap(), analyzer, false);
> Yeah if we could add such a precondition at an appropriate place it would b
I add a check after initializing the baseTblSmap_ of inline views.


http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test:

http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1705
PS4, Line 1705: 
> Sure. Let me find out these kind of cases to reproduce the bug.
The substitutions in other places are for ResultExprs which are from the original query. They're not eq predicates that created by the Analyzer. So I think no problems there. I still add some tests to cover these cases but they can't reproduce the bug.



-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Apr 2019 14:23:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 2:

I intend to take a look at this but need to invest some time understanding the code around it to see if it has any potential pitfalls.


-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Apr 2019 20:16:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2859/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Apr 2019 14:57:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12939/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12939/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1190
PS4, Line 1190:           Expr expr = e.trySubstitute(inlineViewRef.getBaseTblSmap(), analyzer, false);
> OK. Maybe we can add an assertion to flush out buggy cases of "smap[e1] == 
Yeah if we could add such a precondition at an appropriate place it would be useful, hopefully we can confirm they they hold or find out a reason why they don't?



-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Apr 2019 21:42:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................

IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

When generating single node plans for inline views, Impala will create
some equivalence conjuncts based on slot equivalences. However, these
conjuncts may finally be substituted to identity (e.g. a = a) which may
incorrectly reject rows with nulls. We already have some logic to remove
this kind of conjuncts but the existing checks have exceptions.

For example, consider the following tables and a query:
table A        table B            table C
+------+  +------+--------+  +------+------+
| a_id |  | b_id | amount |  | a_id | b_id |
+------+  +------+--------+  +------+------+
| 1    |  | 1    | 10     |  | 1    | 1    |
| 2    |  | 1    | 20     |  | 2    | 2    |
+------+  | 2    | NULL   |  +------+------+
          +------+--------+
    select * from (select t2.a_id, t2.amount1, t2.amount2
        from a
        left outer join (
            select c.a_id, amount as amount1, amount as amount2
            from b join c on b.b_id = c.b_id
        ) t2
        on a.a_id = t2.a_id
    ) t1;

They query has 11 slots. The valueTransferGraph (slot equivalences) has
3 strongly connected components:
 * {slot0 (b.b_id), slot1 (c.b_id)}
 * {slot2 (c.a_id), slot4 (t2.a_id), slot8 (t1.a_id)}
 * {slot3 (b.amount), slot5 (t2.amount1), slot6 (t2.amount2),
slot9 (t1.amount1), slot10 (t1.amount2)}

In SingleNodePlanner#migrateConjunctsToInlineView, when dealing with
inline view t1, a predicate "t1.amount1 = t1.amount2" will first be
created by Analyzer#createEquivConjuncts, then be substituted using the
smap_ of the inline view and become "t2.amount1 = t2.amount2". It can
still pass the IdentityPredicate check. However, the substituted one
will finally be resolved to "amount = amount" and be assigned to the
left outer join node. So nulls are incorrectly rejected.

Actually, when checking IdentityPredicates, we need to check the final
resolved version of them using base table slots (baseTblSmap_). So the
predicate "t1.amount1 = t1.amount2" will be resolved to "amount = amount"
and won't pass the IdentityPredicate check.

Tests:
 * Add plan tests in PlannerTest/inline-view.test
 * Run all tests locally in CORE exploration strategy

Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Reviewed-on: http://gerrit.cloudera.org:8080/12939
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
6 files changed, 410 insertions(+), 20 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2814/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Apr 2019 07:13:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 2:

(1 comment)

Tim, I'm also reviewing my patch and analyzing IMPALA-7957 these days. Don't have too much capacity in the community. I'll try to upload a new patch today.

http://gerrit.cloudera.org:8080/#/c/12939/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12939/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1175
PS1, Line 1175:     Predicate<Expr> isIdentityPredicate = new Predicate<Expr>() {
> Yes, this patch doesn't fix IMPALA-7957. I looked into IMPALA-7957 and foun
Sorry, I found a bug that this will also remove original identity predicates from the query. For example:

 explain select * from (select t2.a_id,t2.amount1,t2.amount2
    from a
    left outer join (
        select c.a_id, amount as amount1, amount as amount2
        from b join c  on b.b_id = c.b_id where amount = amount) t2
    on a.a_id = t2.a_id) t;

The original predicate "amount = amount" is removed finally. We should only remove identity predicates from the created set. Will upload a new patch.



-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Apr 2019 03:12:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12939/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12939/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1175
PS1, Line 1175:     Predicate<Expr> isIdentityPredicate = new Predicate<Expr>() {
> Sorry, I found a bug that this will also remove original identity predicate
Not a bug. The predicate "amount = amount" is correctly pushed down to the scan node. I just missed it...



-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Apr 2019 15:39:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test:

http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1705
PS4, Line 1705: # IMPALA-8386: test coverage for ORDER BY/LIMIT
> To be clear, i didn't think there was a bug in your code, just wanted to ma
Yes, I understand that. To flush out bugs, just start an exhaustive test before running the GVO: https://jenkins.impala.io/job/pre-review-test/346/



-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Apr 2019 23:16:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12939

to look at the new patch set (#3).

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................

IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

When generating single node plans for inline views, Impala will create
some equivalence conjuncts based on slot equivalences. However, these
conjuncts may be identity (e.g. a = a) which may incorrectly reject rows
with nulls. We already have some logic to remove this kind of conjuncts
but the existing checks have exceptions.

For example, consider the following tables and a query:
table A        table B            table C
+------+  +------+--------+  +------+------+
| a_id |  | b_id | amount |  | a_id | b_id |
+------+  +------+--------+  +------+------+
| 1    |  | 1    | 10     |  | 1    | 1    |
| 2    |  | 1    | 20     |  | 2    | 2    |
+------+  | 2    | NULL   |  +------+------+
          +------+--------+
    select * from (select t2.a_id, t2.amount1, t2.amount2
        from a
        left outer join (
            select c.a_id, amount as amount1, amount as amount2
            from b join c on b.b_id = c.b_id
        ) t2
        on a.a_id = t2.a_id
    ) t1;

They query has 11 slots. The valueTransferGraph (slot equivalences) has
3 strongly connected components:
 * {0: b.b_id, 1: c.b_id}
 * {2: c.a_id, 4: t2.a_id, 8: t1.a_id}
 * {3: b.amount, 5: t2.amount1, 6: t2.amount2, 9: t1.amount1,
  10: t1.amount2}

In SingleNodePlanner#migrateConjunctsToInlineView, when dealing with
inline view t1, a predicate "t1.amount1 = t1.amount2" will first be
created by Analyzer#createEquivConjuncts, then be substituted using the
smap_ of the inline view and become "t2.amount1 = t2.amount2". It can
still pass the IdentityPredicate check. However, the substituted one
will finally be resolved to "amount = amount" and be assigned to the
left outer join node. So nulls are incorrectly rejected.

Actually, when checking IdentityPredicates, we need to resolved them
using base table slots (baseTblSmap_). So the predicate "t1.amount1 =
 t1.amount2" will be resolved to "amount = amount" and won't pass the
the IdentityPredicate check.

Tests:
 * Add plan tests in PlannerTest/inline-view.test

Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
3 files changed, 141 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/12939/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12939/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12939/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1175
PS1, Line 1175:     Predicate<Expr> isIdentityPredicate = new Predicate<Expr>() {
> https://issues.apache.org/jira/browse/IMPALA-7957 seems to be a similar cla
Yes, this patch doesn't fix IMPALA-7957. I looked into IMPALA-7957 and found that the problematic predicate is added at another code path in SingleNodePlanner#addUnassignedConjuncts: https://github.com/apache/impala/blob/9090fc239b62c0d698c6c256c20d86b69f8cc64f/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L360

The similarity is that they all use Analyzer#createEquivConjuncts to create predicates. I still need to understand codes around SingleNodePlanner#addUnassignedConjuncts. They are fairly old codes added by a 4 years old commit: https://github.com/apache/impala/commit/0752b9fa624d3588f3b15dbb5d1bc60d517412f8
Can't open the Gerrit reivew for that patch now. Maybe there're some discussions at that time.



-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 06 Apr 2019 14:41:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12939

to look at the new patch set (#5).

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................

IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

When generating single node plans for inline views, Impala will create
some equivalence conjuncts based on slot equivalences. However, these
conjuncts may finally be substituted to identity (e.g. a = a) which may
incorrectly reject rows with nulls. We already have some logic to remove
this kind of conjuncts but the existing checks have exceptions.

For example, consider the following tables and a query:
table A        table B            table C
+------+  +------+--------+  +------+------+
| a_id |  | b_id | amount |  | a_id | b_id |
+------+  +------+--------+  +------+------+
| 1    |  | 1    | 10     |  | 1    | 1    |
| 2    |  | 1    | 20     |  | 2    | 2    |
+------+  | 2    | NULL   |  +------+------+
          +------+--------+
    select * from (select t2.a_id, t2.amount1, t2.amount2
        from a
        left outer join (
            select c.a_id, amount as amount1, amount as amount2
            from b join c on b.b_id = c.b_id
        ) t2
        on a.a_id = t2.a_id
    ) t1;

They query has 11 slots. The valueTransferGraph (slot equivalences) has
3 strongly connected components:
 * {slot0 (b.b_id), slot1 (c.b_id)}
 * {slot2 (c.a_id), slot4 (t2.a_id), slot8 (t1.a_id)}
 * {slot3 (b.amount), slot5 (t2.amount1), slot6 (t2.amount2),
slot9 (t1.amount1), slot10 (t1.amount2)}

In SingleNodePlanner#migrateConjunctsToInlineView, when dealing with
inline view t1, a predicate "t1.amount1 = t1.amount2" will first be
created by Analyzer#createEquivConjuncts, then be substituted using the
smap_ of the inline view and become "t2.amount1 = t2.amount2". It can
still pass the IdentityPredicate check. However, the substituted one
will finally be resolved to "amount = amount" and be assigned to the
left outer join node. So nulls are incorrectly rejected.

Actually, when checking IdentityPredicates, we need to check the final
resolved version of them using base table slots (baseTblSmap_). So the
predicate "t1.amount1 = t1.amount2" will be resolved to "amount = amount"
and won't pass the IdentityPredicate check.

Tests:
 * Add plan tests in PlannerTest/inline-view.test
 * Run all tests locally in CORE exploration strategy

Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
6 files changed, 410 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/12939/5
-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 3:

Changes in patch set 3:
* Add more logs that I found useful in debug
* Fix the order of marking assigned and ignoring identity predicates


-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Apr 2019 16:10:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2737/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Apr 2019 16:53:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4056/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 Apr 2019 14:48:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12939/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12939/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1175
PS1, Line 1175:     Predicate<Expr> isIdentityPredicate = new Predicate<Expr>() {
https://issues.apache.org/jira/browse/IMPALA-7957 seems to be a similar class of errors. I checked and confirmed that this patch doesn't fix it. Might be worth a quick look to see if there's a straightforward way to extend this fix.



-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Apr 2019 17:49:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 Apr 2019 20:08:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12939

to look at the new patch set (#4).

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................

IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

When generating single node plans for inline views, Impala will create
some equivalence conjuncts based on slot equivalences. However, these
conjuncts may finally be substituted to identity (e.g. a = a) which may
incorrectly reject rows with nulls. We already have some logic to remove
this kind of conjuncts but the existing checks have exceptions.

For example, consider the following tables and a query:
table A        table B            table C
+------+  +------+--------+  +------+------+
| a_id |  | b_id | amount |  | a_id | b_id |
+------+  +------+--------+  +------+------+
| 1    |  | 1    | 10     |  | 1    | 1    |
| 2    |  | 1    | 20     |  | 2    | 2    |
+------+  | 2    | NULL   |  +------+------+
          +------+--------+
    select * from (select t2.a_id, t2.amount1, t2.amount2
        from a
        left outer join (
            select c.a_id, amount as amount1, amount as amount2
            from b join c on b.b_id = c.b_id
        ) t2
        on a.a_id = t2.a_id
    ) t1;

They query has 11 slots. The valueTransferGraph (slot equivalences) has
3 strongly connected components:
 * {slot0 (b.b_id), slot1 (c.b_id)}
 * {slot2 (c.a_id), slot4 (t2.a_id), slot8 (t1.a_id)}
 * {slot3 (b.amount), slot5 (t2.amount1), slot6 (t2.amount2),
slot9 (t1.amount1), slot10 (t1.amount2)}

In SingleNodePlanner#migrateConjunctsToInlineView, when dealing with
inline view t1, a predicate "t1.amount1 = t1.amount2" will first be
created by Analyzer#createEquivConjuncts, then be substituted using the
smap_ of the inline view and become "t2.amount1 = t2.amount2". It can
still pass the IdentityPredicate check. However, the substituted one
will finally be resolved to "amount = amount" and be assigned to the
left outer join node. So nulls are incorrectly rejected.

Actually, when checking IdentityPredicates, we need to check the final
resolved version of them using base table slots (baseTblSmap_). So the
predicate "t1.amount1 = t1.amount2" will be resolved to "amount = amount"
and won't pass the IdentityPredicate check.

Tests:
 * Add plan tests in PlannerTest/inline-view.test
 * Run all tests locally in CORE exploration strategy

Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
3 files changed, 221 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/12939/4
-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 Apr 2019 14:48:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 5: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test:

http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1705
PS4, Line 1705: # IMPALA-8386: test coverage for ORDER BY/LIMIT
> The substitutions in other places are for ResultExprs which are from the or
To be clear, i didn't think there was a bug in your code, just wanted to make sure we had some coverage of "interesting" cases.



-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Apr 2019 21:50:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12939/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12939/4//COMMIT_MSG@15
PS4, Line 15: For example, consider the following tables and a query:
Thanks for the nice explanation!


http://gerrit.cloudera.org:8080/#/c/12939/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12939/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1190
PS4, Line 1190:           Expr expr = e.trySubstitute(inlineViewRef.getBaseTblSmap(), analyzer, false);
Intuitively this makes sense to me, since it's doing a "deeper" substitution that will "collapse" more expressions from the view into the same expressions on the base table.  I'm trying to convince myself that there are not edge cases where the previous check would have identified an equivalence predicate 

I.e. where  smap[e1] == smap[e2] but baseTblSmap[e1] != baseTableSmap[e2]

I think generally the first implies the second, but it's possible there are some other latent bugs.

I have a couple of requests for different tests that might catch any bugs with smap/baseTableSmap and the corresponding resultExprs_/baseTableResultExprs_ state.


http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test:

http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1705
PS4, Line 1705: 
Can we add a test where each subquery has an ORDER BY/LIMIT. This case is semi-interesting since there's an additional layer of substitution in createSortTupleInfo().

Also tests with a UNION inside the subquery since there is logic in UnionNode manipulating the different ResultExprs.



-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Apr 2019 19:43:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2669/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 06 Apr 2019 15:24:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 )

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12939/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12939/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1190
PS4, Line 1190:           Expr expr = e.trySubstitute(inlineViewRef.getBaseTblSmap(), analyzer, false);
> Intuitively this makes sense to me, since it's doing a "deeper" substitutio
OK. Maybe we can add an assertion to flush out buggy cases of "smap[e1] == smap[e2] but baseTblSmap[e1] != baseTableSmap[e2]".


http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test:

http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1705
PS4, Line 1705: 
> Can we add a test where each subquery has an ORDER BY/LIMIT. This case is s
Sure. Let me find out these kind of cases to reproduce the bug.



-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Apr 2019 14:45:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12939

to look at the new patch set (#2).

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
......................................................................

IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

When generating single node plans for inline views, Impala will create
some equivalence conjuncts based on slot equivalences. However, these
conjuncts may be identity (e.g. a = a) which may incorrectly reject rows
with nulls. We already have some logic to remove this kind of conjuncts
but the existing checks have exceptions.

For example, consider the following tables and a query:
table A        table B            table C
+------+  +------+--------+  +------+------+
| a_id |  | b_id | amount |  | a_id | b_id |
+------+  +------+--------+  +------+------+
| 1    |  | 1    | 10     |  | 1    | 1    |
| 2    |  | 1    | 20     |  | 2    | 2    |
+------+  | 2    | NULL   |  +------+------+
          +------+--------+
    select * from (select t2.a_id, t2.amount1, t2.amount2
        from a
        left outer join (
            select c.a_id, amount as amount1, amount as amount2
            from b join c on b.b_id = c.b_id
        ) t2
        on a.a_id = t2.a_id
    ) t1;

They query has 11 slots. The valueTransferGraph (slot equivalences) has
3 strongly connected components:
 * {0: b.b_id, 1: c.b_id}
 * {2: c.a_id, 4: t2.a_id, 8: t1.a_id}
 * {3: b.amount, 5: t2.amount1, 6: t2.amount2, 9: t1.amount1,
  10: t1.amount2}

In SingleNodePlanner#migrateConjunctsToInlineView, when dealing with
inline view t1, a predicate "t1.amount1 = t1.amount2" will first be
created by Analyzer#createEquivConjuncts, then be substituted using the
smap_ of the inline view and become "t2.amount1 = t2.amount2". It can
still pass the IdentityPredicate check.

Actually, when checking IdentityPredicates, we need to resolved them
using base table slots (baseTblSmap_). So the predicate "t1.amount1 =
 t1.amount2" will be resolved to "amount = amount" and won't pass the
the IdentityPredicate check.

Tests:
 * Add plan tests in PlannerTest/inline-view.test

Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
3 files changed, 120 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/12939/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>