You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Alex Behm (Code Review)" <ge...@cloudera.org> on 2016/03/01 04:20:03 UTC

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/2367

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................

IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

The bug: For correct predicate assignment we rely on TableRef.getAllTupleIds()
and TableRef.getMaterializedTupleIds(). The implementation of those functions
used to traverse the chain of table refs and collect the appropriate ids.
However, during plan generation we alter the chain of table refs, in particular,
for dealing with nested collections, so those altered TableRefs do not return the
expected list of ids, leading to wrong decisions in predicate assignment.

The fix: Cache the lists of ids during analysis, so we are free to alter the
chain of TableRefs during plan generation.

Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
---
M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
3 files changed, 58 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/67/2367/2
-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/2367/3/testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
File testdata/workloads/functional-planner/queries/PlannerTest/join-order.test:

Line 799: 09:HASH JOIN [INNER JOIN]
This is a plan improvement because joining with alltypestiny first is better.


Line 914: |  08:HASH JOIN [INNER JOIN]
This is a plan improvement because joining with alltypessmall first is better.


Line 965: |  08:HASH JOIN [INNER JOIN]
This is a plan improvement because joining with alltypessmall first is better.


-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/2367/2/fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
File fe/src/main/java/com/cloudera/impala/analysis/TableRef.java:

Line 439:    * and the TupleDescriptor (desc) of this table has been created.
> update comment
Done


http://gerrit.cloudera.org:8080/#/c/2367/2/testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
File testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test:

Line 763
> what happened here?
This duplicate assignment was a combination of two bugs:

First. the getAllTupleIds() bug, see Analyzer.canEvalAntiJoinedConjunct() lead us to not pick up the conjunct in the regular getHashLookupJoinConjuncts().

Second, we incorrectly created a hash conjunct inferred equivalence classes. Since the original conjunct was still unassigned, we picked it up again as an other join conjunct,

I also fixed the incorrect inference based on equivalence classes now.


Line 1746: |  |  predicates: o.o_orderkey IS NULL, o.o_orderstatus IS NOT DISTINCT FROM o_orderpriority
> where did the 'is null' end up before?
In the scan as well as the join. That was the bug.


-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has uploaded a new patch set (#4).

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................

IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

The bug: For correct predicate assignment we rely on TableRef.getAllTupleIds()
and TableRef.getMaterializedTupleIds(). The implementation of those functions
used to traverse the chain of table refs and collect the appropriate ids.
However, during plan generation we alter the chain of table refs, in particular,
for dealing with nested collections, so those altered TableRefs do not return the
expected list of ids, leading to wrong decisions in predicate assignment.

The fix: Cache the lists of ids during analysis, so we are free to alter the
chain of TableRefs during plan generation.

Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
---
M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
5 files changed, 99 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/67/2367/4
-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/2367/7/testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test:

Line 1922: # Q18 - Large Value Customer Query
shouldn't this really be

select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, l.sum_qty

from customer c, c.c_orders o,

(select sum(l_quantity) as sum_qty from o.o_lineitem l having sum(l_quantity) > 300) l

order by o_totalprice desc, o_orderdate
?


-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/2367/7/testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test:

Line 1922: # Q18 - Large Value Customer Query
> shouldn't this really be
Good catch! The group by clause can be omitted here.

I'll add this improvement in a separate patch on trunk.


-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Silvius Rus (Code Review)" <ge...@cloudera.org>.
Silvius Rus has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 5: Code-Review+2

Approved by gatekeeper.

-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has abandoned this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Abandoned

Moving over to cdh5-trunk because the bug being fixed has a low likelihood of being hit and the fix is risky.

-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Internal Jenkins, Silvius Rus,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2367

to look at the new patch set (#7).

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................

IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

The bug: For correct predicate assignment we rely on TableRef.getAllTupleIds()
and TableRef.getMaterializedTupleIds(). The implementation of those functions
used to traverse the chain of table refs and collect the appropriate ids.
However, during plan generation we alter the chain of table refs, in particular,
for dealing with nested collections, so those altered TableRefs do not return the
expected list of ids, leading to wrong decisions in predicate assignment.

The fix: Cache the lists of ids during analysis, so we are free to alter the
chain of TableRefs during plan generation.

Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
---
M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
7 files changed, 247 insertions(+), 208 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/67/2367/7
-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 5: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/2367/3/fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java
File fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java:

Line 319:   public void swapChildren() {
what's that for?


http://gerrit.cloudera.org:8080/#/c/2367/3/fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
File fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java:

Line 1392:           if (analyzer.hasValueTransfer(lhsSid, rhsSid) &&
we (mostly) use 

a\n

&& b

in this file, let's stick to that


Line 1393:               analyzer.hasValueTransfer(rhsSid, lhsSid)) {
isn't the lhs->rhs value transfer sufficient?


Line 1396:             // equalities are redundant
this comments seems to contradict the fact that you're doing this for all lhsSlotIds.

also, why is this done for all lhsSlotIds?


http://gerrit.cloudera.org:8080/#/c/2367/3/testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
File testdata/workloads/functional-planner/queries/PlannerTest/join-order.test:

Line 799: 09:HASH JOIN [INNER JOIN]
> This is a plan improvement because joining with alltypestiny first is bette
where does the improvement come from? it doesn't look like the predicates changed (so the card. estimates shouldn't have changed either).


-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/2367/3/fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java
File fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java:

Line 319:   public void swapChildren() {
> what's that for?
Sorry. From testing. Removed.


http://gerrit.cloudera.org:8080/#/c/2367/3/fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
File fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java:

Line 1392:           if (analyzer.hasValueTransfer(lhsSid, rhsSid) &&
> we (mostly) use 
Done


Line 1393:               analyzer.hasValueTransfer(rhsSid, lhsSid)) {
> isn't the lhs->rhs value transfer sufficient?
Here is one example where that is not sufficient/correct:
 
select a.id aid, b.id bid, a.int_col aint, b.int_col bint
from functional.alltypes a
inner join functional.alltypes b
  on a.int_col < b.int_col
left outer join functional.alltypes c
  on a.id = b.id and a.id = c.id

We cannot make the a/b join a HJ because that would reject non-matches from 'c' which would have otherwise been nulled.

I added a clarifying comment as discussed.


Line 1396:             // equalities are redundant
> this comments seems to contradict the fact that you're doing this for all l
Good catch. We should break as soon as we added one predicate (all others are indeed redundant).

We do it for all slotIds because we need to find one with a mutual value transfer. Elements in the same equivalence class only require a value transfer in one direction (i.e. the eq class is a superset of those slots with mutual transfers)


http://gerrit.cloudera.org:8080/#/c/2367/3/testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
File testdata/workloads/functional-planner/queries/PlannerTest/join-order.test:

Line 799: 09:HASH JOIN [INNER JOIN]
> where does the improvement come from? it doesn't look like the predicates c
Notice that we have an inverted join right below these inner joins. The inversion, unfortunately, used to limit the plans we explored because we relied on the TableRef chain for checking join applicability.

Now that we cache the table ref and materialized tuple ids we can explore more plans after an inverted join.


-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 6: Code-Review+2

Rebase, trivial test fix.

-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 7:

(1 comment)

There was an interesting failure related to now having the complete table ref ids in Analyzer.canEvalAntiJoinedConjunct(). That is now fixed, but has a few plan changes as a result.

Please take another look. We can also move this to trunk if you prefer.

http://gerrit.cloudera.org:8080/#/c/2367/7/fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
File fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java:

Line 765:             requiredTblRefIds.addAll(ref.getAllTupleIds());
This is the interesting change to fix this failing query:

 select c_custkey, c_mktsegment, o_orderkey, o_totalprice, o_orderdate
from tpch_nested_parquet.customer c, c.c_orders o
where c_custkey < 10
  and o_orderdate like "1992%"
  and cast(o_orderdate as timestamp) + interval 13 days not in
      (select cast(l_shipdate as timestamp)
       from o.o_lineitems)


-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Internal Jenkins, Silvius Rus,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2367

to look at the new patch set (#6).

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................

IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

The bug: For correct predicate assignment we rely on TableRef.getAllTupleIds()
and TableRef.getMaterializedTupleIds(). The implementation of those functions
used to traverse the chain of table refs and collect the appropriate ids.
However, during plan generation we alter the chain of table refs, in particular,
for dealing with nested collections, so those altered TableRefs do not return the
expected list of ids, leading to wrong decisions in predicate assignment.

The fix: Cache the lists of ids during analysis, so we are free to alter the
chain of TableRefs during plan generation.

Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
---
M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
6 files changed, 99 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/67/2367/6
-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has uploaded a new patch set (#3).

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................

IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

The bug: For correct predicate assignment we rely on TableRef.getAllTupleIds()
and TableRef.getMaterializedTupleIds(). The implementation of those functions
used to traverse the chain of table refs and collect the appropriate ids.
However, during plan generation we alter the chain of table refs, in particular,
for dealing with nested collections, so those altered TableRefs do not return the
expected list of ids, leading to wrong decisions in predicate assignment.

The fix: Cache the lists of ids during analysis, so we are free to alter the
chain of TableRefs during plan generation.

Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
---
M fe/src/main/java/com/cloudera/impala/analysis/BinaryPredicate.java
M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
6 files changed, 101 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/67/2367/3
-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2367

to look at the new patch set (#5).

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................

IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

The bug: For correct predicate assignment we rely on TableRef.getAllTupleIds()
and TableRef.getMaterializedTupleIds(). The implementation of those functions
used to traverse the chain of table refs and collect the appropriate ids.
However, during plan generation we alter the chain of table refs, in particular,
for dealing with nested collections, so those altered TableRefs do not return the
expected list of ids, leading to wrong decisions in predicate assignment.

The fix: Cache the lists of ids during analysis, so we are free to alter the
chain of TableRefs during plan generation.

Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
---
M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
5 files changed, 98 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/67/2367/5
-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>

[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/2367/2/fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
File fe/src/main/java/com/cloudera/impala/analysis/TableRef.java:

Line 439:    * and the TupleDescriptor (desc) of this table has been created.
update comment


http://gerrit.cloudera.org:8080/#/c/2367/2/testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
File testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test:

Line 763
what happened here?


Line 1746: |  |  predicates: o.o_orderkey IS NULL, o.o_orderstatus IS NOT DISTINCT FROM o_orderpriority
where did the 'is null' end up before?


-- 
To view, visit http://gerrit.cloudera.org:8080/2367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes