You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2017/08/02 18:47:48 UTC

[Impala-ASF-CR] IMPALA-5725: coalesce() with outer join incorrectly rewritten

Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7567

Change subject: IMPALA-5725: coalesce() with outer join incorrectly rewritten
......................................................................

IMPALA-5725: coalesce() with outer join incorrectly rewritten

A recent change, IMPALA-5016, added an expr rewrite rule to simplfy
coalesce(). This rule eliminates the coalesce() when its first
parameter (that isn't constant null) is a SlotRef pointing to a
SlotDescriptor that is non-nullable (for example because it is from
a non-nullable Kudu column or because it is from an HDFS partition
column with no null partitions), under the assumption that the SlotRef
could never have a null value.

This assumption is violated when the SlotRef is the output of an
outer join, leading to incorrect results being returned. The problem
is that the nullability of a SlotDescriptor (which determines whether
there is a null indicator bit in the tuple for that slot) is a
slightly different property than the nullability of a SlotRef pointing
to that SlotDescriptor (since the SlotRef can still be NULL if the
entire tuple is NULL).

This patch removes the portion of the rewrite rule that considers
the nullability of the SlotDescriptor. This means that we're missing
out on some optimizations opportunities and we should revisit this in
a way that works with outer joins (IMPALA-5753)

Testing:
- Updated FE tests.
- Added regression tests to exprs.test

Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
---
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
4 files changed, 30 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/7567/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-5725: coalesce() with outer join incorrectly rewritten

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5725: coalesce() with outer join incorrectly rewritten
......................................................................


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5725: coalesce() with outer join incorrectly rewritten

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5725: coalesce() with outer join incorrectly rewritten
......................................................................


Patch Set 1: Code-Review+2

(3 comments)

Nice, I think this looks good. I think we can commit this. Alex should weigh in when he's back to make sure we're not missing any other cases.

http://gerrit.cloudera.org:8080/#/c/7567/1/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

PS1, Line 57: , though
nit: can you put this in a separate sentence on the next line, e.g.

// if false, this slot cannot be NULL.
// NOTE: It is still possible...


PS1, Line 58: for
            :   // example as a result of this slot being outer joined on.
nit: '..., for example as the result of an outer join.'


http://gerrit.cloudera.org:8080/#/c/7567/1/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java:

PS1, Line 103:    * Simplify COALESCE by skipping leading nulls and applying the following transformations:
             :    * COALESCE(null, a, b) -> COALESCE(a, b);
             :    * COALESCE(<literal>, a, b) -> <literal>, when literal is not NullLiteral;
             :    * COALESCE(<partition-slotref>, a, b) -> <partition-slotref>,
             :    * when the partition column does not contain NULL.
update


-- 
To view, visit http://gerrit.cloudera.org:8080/7567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5725: coalesce() with outer join incorrectly rewritten

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5725: coalesce() with outer join incorrectly rewritten
......................................................................


Patch Set 2: Code-Review+2

carrying my +2...
I think we can commit this. Alex should weigh in when he's back to make sure we're not missing any other cases.

-- 
To view, visit http://gerrit.cloudera.org:8080/7567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5725: coalesce() with outer join incorrectly rewritten

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5725: coalesce() with outer join incorrectly rewritten
......................................................................


Patch Set 2: Code-Review+2

GVO failed due to IMPALA-5760

-- 
To view, visit http://gerrit.cloudera.org:8080/7567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5725: coalesce() with outer join incorrectly rewritten

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5725: coalesce() with outer join incorrectly rewritten
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/987/

-- 
To view, visit http://gerrit.cloudera.org:8080/7567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5725: coalesce() with outer join incorrectly rewritten

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5725: coalesce() with outer join incorrectly rewritten
......................................................................


IMPALA-5725: coalesce() with outer join incorrectly rewritten

A recent change, IMPALA-5016, added an expr rewrite rule to simplfy
coalesce(). This rule eliminates the coalesce() when its first
parameter (that isn't constant null) is a SlotRef pointing to a
SlotDescriptor that is non-nullable (for example because it is from
a non-nullable Kudu column or because it is from an HDFS partition
column with no null partitions), under the assumption that the SlotRef
could never have a null value.

This assumption is violated when the SlotRef is the output of an
outer join, leading to incorrect results being returned. The problem
is that the nullability of a SlotDescriptor (which determines whether
there is a null indicator bit in the tuple for that slot) is a
slightly different property than the nullability of a SlotRef pointing
to that SlotDescriptor (since the SlotRef can still be NULL if the
entire tuple is NULL).

This patch removes the portion of the rewrite rule that considers
the nullability of the SlotDescriptor. This means that we're missing
out on some optimizations opportunities and we should revisit this in
a way that works with outer joins (IMPALA-5753)

Testing:
- Updated FE tests.
- Added regression tests to exprs.test

Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Reviewed-on: http://gerrit.cloudera.org:8080/7567
Reviewed-by: Matthew Jacobs <mj...@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tm...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
4 files changed, 29 insertions(+), 44 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved
  Thomas Tauber-Marshall: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-5725: coalesce() with outer join incorrectly rewritten

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5725: coalesce() with outer join incorrectly rewritten
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/979/

-- 
To view, visit http://gerrit.cloudera.org:8080/7567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5725: coalesce() with outer join incorrectly rewritten

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5725: coalesce() with outer join incorrectly rewritten
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7567/1/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

PS1, Line 57: 
> nit: can you put this in a separate sentence on the next line, e.g.
Done


PS1, Line 58: a
            :   // NULL value if the entire tuple is NULL, for example as 
> nit: '..., for example as the result of an outer join.'
Done


http://gerrit.cloudera.org:8080/#/c/7567/1/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java:

PS1, Line 103:    * Simplify COALESCE by skipping leading nulls and applying the following transformations:
             :    * COALESCE(null, a, b) -> COALESCE(a, b);
             :    * COALESCE(<literal>, a, b) -> <literal>, when literal is not NullLiteral;
             :    */
             :   private Expr simplifyCoalesceFunctionCallExpr(Funct
> update
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5725: coalesce() with outer join incorrectly rewritten

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5725: coalesce() with outer join incorrectly rewritten
......................................................................


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/979/

-- 
To view, visit http://gerrit.cloudera.org:8080/7567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5725: coalesce() with outer join incorrectly rewritten

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7567

to look at the new patch set (#2).

Change subject: IMPALA-5725: coalesce() with outer join incorrectly rewritten
......................................................................

IMPALA-5725: coalesce() with outer join incorrectly rewritten

A recent change, IMPALA-5016, added an expr rewrite rule to simplfy
coalesce(). This rule eliminates the coalesce() when its first
parameter (that isn't constant null) is a SlotRef pointing to a
SlotDescriptor that is non-nullable (for example because it is from
a non-nullable Kudu column or because it is from an HDFS partition
column with no null partitions), under the assumption that the SlotRef
could never have a null value.

This assumption is violated when the SlotRef is the output of an
outer join, leading to incorrect results being returned. The problem
is that the nullability of a SlotDescriptor (which determines whether
there is a null indicator bit in the tuple for that slot) is a
slightly different property than the nullability of a SlotRef pointing
to that SlotDescriptor (since the SlotRef can still be NULL if the
entire tuple is NULL).

This patch removes the portion of the rewrite rule that considers
the nullability of the SlotDescriptor. This means that we're missing
out on some optimizations opportunities and we should revisit this in
a way that works with outer joins (IMPALA-5753)

Testing:
- Updated FE tests.
- Added regression tests to exprs.test

Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
---
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
4 files changed, 29 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/7567/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>