You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tianyi Wang (Code Review)" <ge...@cloudera.org> on 2017/10/18 17:54:35 UTC

[Impala-ASF-CR] IMPALA-5976: Remove equivalent class computation in FE

Tianyi Wang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8317


Change subject: IMPALA-5976: Remove equivalent class computation in FE
......................................................................

IMPALA-5976: Remove equivalent class computation in FE

Equivalent class is used to get the equivalencies between slots. It is
ill-defined and the current implementation is inefficient. This patch
removes it and directly uses the information from value the transfer
graph instead.
Value transfer graph is reimplemented using Tarjan's strongly connected
component algorithm and adjacency list based DFS to speed up on both
condensed and sparse graphs.

Testing: A few test cases are modified because the change in equivalence
definition enables some existing optimizations on these cases. It passes
other existing tests. On a query with 1800 union operations, the
equivalence class computation time is reduced from 7m57s to 104ms and
the planning time is reduced from 8m5s to 13s.

Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
---
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNotEmptyPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TimestampArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/util/Graph.java
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
30 files changed, 805 insertions(+), 921 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................

IMPALA-5976: Remove equivalence class computation in FE

Equivalent class is used to get the equivalencies between slots. It is
ill-defined and the current implementation is inefficient. This patch
removes it and directly uses the information from the value transfer
graph instead.
Value transfer graph is reimplemented using Tarjan's strongly connected
component algorithm and BFS with adjacency lists to speed up on both
condensed and sparse graphs.

Testing: It passes the existing tests. In planner tests the equivalence
between SCC-condensed graph and uncondensed graph is checked. A test
case is added for a helper class IntArrayList. An outer-join edge case
is added in planner test. On a query with 1800 union operations, the
equivalence class computation time is reduced from 7m57s to 65ms and the
planning time is reduced from 8m5s to 13s.

Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/util/Graph.java
A fe/src/main/java/org/apache/impala/util/IntArrayList.java
A fe/src/main/java/org/apache/impala/util/IntIterator.java
A fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
33 files changed, 1,119 insertions(+), 785 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/8
-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 8
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 9:

Added the outer join case in the comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 9
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Sat, 18 Nov 2017 02:26:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91: /**
> Slot A and slot B may be in the same tuple.
My point is that in "select * from A union B", A.col and B.col doesn't appear in the same tuple or row but there is value transfer. So "for all rows containing both slots,slot B has the same value as slot A" only covers the join case.



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 01:04:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8317

to look at the new patch set (#2).

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................

IMPALA-5976: Remove equivalence class computation in FE

Equivalent class is used to get the equivalencies between slots. It is
ill-defined and the current implementation is inefficient. This patch
removes it and directly uses the information from the value transfer
graph instead.
Value transfer graph is reimplemented using Tarjan's strongly connected
component algorithm and BFS with adjacency lists to speed up on both
condensed and sparse graphs.

Testing: It passes the existing tests. In planner tests the equivalence
between SCC-condensed graph and uncondensed graph is checked. A test
case is added for a helper class IntArrayList. On a query with 1800
union operations, the equivalence class computation time is reduced from
7m57s to 65ms and the planning time is reduced from 8m5s to 13s.

Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/IsNotEmptyPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TimestampArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/util/Graph.java
A fe/src/main/java/org/apache/impala/util/IntArrayList.java
A fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
35 files changed, 1,147 insertions(+), 941 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@406
PS6, Line 406:       // For right anti and semi joins the lhs join slots does not appear in the output.
> I don't understand. Why are the slots in the rhsJoinPartition not returned 
I got confused. You are right.


http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
File testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test:

http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1056
PS6, Line 1056: # around a check in refsNullableTupleId().
> - I removed refsNullableTupleId in the newest ps.
Makes sense. Let me take a look at the newest ps



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Sat, 18 Nov 2017 01:22:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 3:

(3 comments)

Responding to comments, taking another pass now.

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91:  * Slot A has value transfer to slot B if a predicate on A can be applied to B in most
> "containing both slots" only covers join.
Slot A and slot B may be in the same tuple.

Still I think the formulation is correct because the value transfer is established at the point the corresponding equality condition is evaluated.

Example:
select * from A join B on a.id=b.id

We can say that a.id and b.id are definitely equal after the join, but they are independent before the join.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1527
PS4, Line 1527:       // same as the one that makes destTid nullable
> Do you mean "v.b is null"?
I mean that the "is null" condition should be on a column that does not participate in the "A.a=B.b" condition to show that the NULL-checking predicate is unrelated to the slots participating in the equivalence.

For example this complete query:

select * from (select A.a a, A.b b, B.c c from A left join B on A.a=B.b) v where v.c is null


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399
PS3, Line 399:     boolean partitionedByRight = node.getJoinOp() == RIGHT_OUTER_JOIN ||
> Done. I just realized that full outer joins should have random output parti
Good call, I just realized that too.



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 00:46:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................

IMPALA-5976: Remove equivalence class computation in FE

Equivalent class is used to get the equivalencies between slots. It is
ill-defined and the current implementation is inefficient. This patch
removes it and directly uses the information from the value transfer
graph instead.
Value transfer graph is reimplemented using Tarjan's strongly connected
component algorithm and BFS with adjacency lists to speed up on both
condensed and sparse graphs.

Testing: It passes the existing tests. In planner tests the equivalence
between SCC-condensed graph and uncondensed graph is checked. A test
case is added for a helper class IntArrayList. An outer-join edge case
is added in planner test. On a query with 1800 union operations, the
equivalence class computation time is reduced from 7m57s to 65ms and the
planning time is reduced from 8m5s to 13s.

Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/util/Graph.java
A fe/src/main/java/org/apache/impala/util/IntArrayList.java
A fe/src/main/java/org/apache/impala/util/IntIterator.java
A fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
33 files changed, 1,119 insertions(+), 785 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 6:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91: /**
> Fair point. I was thinking of a definition like this:
"containing both slots" only covers join.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@102
PS4, Line 102:  * Slot value transfers:
> Slot value transfers:
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@105
PS4, Line 105:  * Each slot is contained in exactly one equivalence class. A slot equivalence class is a
> What does "Its" refer to here?
Done. The first 'its' is the value transfer relation. The second 'its' is the symmetric closure.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1527
PS4, Line 1527:       // select * from (select A.a, B.b from A left join B on A.a=B.b) v where v.b is null
> to drive it home even further use "B.col is null" as the predicate to show 
Do you mean "v.b is null"?


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1648
PS4, Line 1648:     // A map from equivalence class IDs to equivalence classes. The equivalence classes
> typo: classes
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1813
PS4, Line 1813:    * Returns a map of slot equivalence classes on the set of slots in the given tuples.
> the given tuples
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1838
PS4, Line 1838:    * propagation. Each mapping assigns every slot in srcSids to a slot in destTid which
> Each mapping assigns every slot in srcSids to a slot in destTid which has a
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1989
PS4, Line 1989:             "Condensed Graph:\n" + condensedTc);
> move the first "\n" to previous line
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1998
PS4, Line 1998:     for (ExprId id : globalState_.conjuncts.keySet()) {
> remove (pretty clear from function comment)
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@691
PS4, Line 691:    * Returns true if this expr matches 'that'. Two exprs match if:
> Move this into SlotRef since it's SlotRef specific?
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@696
PS4, Line 696:    */
> Returns true if this expr matches 'that'
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@699
PS4, Line 699:     if (this instanceof CastExpr && ((CastExpr)this).isImplicit()) {
> For every pair of corresponding SlotRefs, slotRefCmp.matches() returns true
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@700
PS4, Line 700:       return children_.get(0).matches(that, slotRefCmp);
> localEquals()
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@719
PS4, Line 719:   protected boolean localEquals(Expr that) {
> I think we should separate matches() and localEquals() more cleanly. I thin
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@727
PS4, Line 727:    */
> Not checking fn_ seems a little strange. The function semantics would be cl
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@962
PS4, Line 962:         if (newExpr.matches(expr, cmp)) {
> according to matches() using 'cmp'
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@198
PS4, Line 198: 
> A wrapper around localEquals().
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399
PS3, Line 399:     DataPartition outputPartition;
> Makes sense, please add a comment to explain.
Done. I just realized that full outer joins should have random output partition. I added a test case in the planner test. That test case breaks both ps4 and the base.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@52
PS3, Line 52:     private final IntArrayList[] adjList_;
> This addresses the performance issue, but it does not address the testing/r
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@91
PS4, Line 91: 
> @Override
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@106
PS4, Line 106:     /**
> Don't understand the second sentence
That sentence is removed.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@134
PS4, Line 134:               visited.set(dstIt.peek());
> A graph condensed by its strongly-connected components (SCC). Vertices are 
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@152
PS4, Line 152:    * their SCCs and an inner graph on the SCCs is stored.
> @Override
Done


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@199
PS4, Line 199:         public int peek() {
> Very clear now! Thanks.
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Nov 2017 00:31:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 4:

(62 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@281
PS3, Line 281: 
> Let's please avoid code changes unrelated to the purpose of this patch as m
Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91: /**
> We need a precise definition. The original definition was more precise.
I think If the predicate transfer is the starting point to consider whether there is a value transfer between slots, it should be defined as so. The original sentence "is computed based on..." is not really a definition to me.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@93
PS3, Line 93:  *
> Not sure this part adds value. What's the significance of these statements 
Removed. I intended to use these statements to show that the symmetric closure is an equivalence relation.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@277
PS3, Line 277:     // Tracks all warnings (e.g. non-fatal errors) that were generated during analysis.
> The SCC-condensed graph representation of all slot value transfers.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@278
PS3, Line 278:     // These are passed to the backend and eventually propagated to the shell. Maps from
> public/private?
Done. private.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1146
PS3, Line 1146:           conjunctIds.add(e.getId());
> a mutual value transfer
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1467
PS3, Line 1467:     List<TupleId> tids = Lists.newArrayList();
> how about: if every source slot has a value transfer to a slot in destTid
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1513
PS3, Line 1513:       boolean hasOuterJoinedTuple = false;
> Let's simplify the example and make it as clear as possible:
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1618
PS3, Line 1618:         // check if we already created this predicate
> Need a definition of equivalence class in the Analyzer class comment. You d
I think they are the same.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1636
PS3, Line 1636:    * equivalent slot sets of lhsTids and rhsTids.
> Garbled sentence, please clean up
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1956
PS3, Line 1956:   public boolean isVisible(TupleId tid) {
> the registered value transfers
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1973
PS3, Line 1973:         new WritableGraph(globalState_.descTbl.getMaxSlotId().asInt() + 1);
> missing space in string
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1982
PS3, Line 1982:       RandomAccessibleGraph reference =
> Add value-transfer edges to 'g' based on the registered equi-join conjuncts
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2059
PS3, Line 2059:       if (tblRef.getJoinOp() == JoinOperator.LEFT_OUTER_JOIN
> of the given slot id
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2062
PS3, Line 2062:         g.addEdge(outerSlot.asInt(), innerSlot.asInt());
> slot -> sid
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2066
PS3, Line 2066:       }
> Unfortunate that we have to do this. Should probably clean this up later by
Ack


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2073
PS3, Line 2073:    * Time complexity: O(V) where V = number of slots
> Simpler to avoid "image set" and just use your second sentence to describe.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2087
PS3, Line 2087:    * Time complexity: O(V) where V = number of slots
> Second sentence should be part of the equivalence class definition in the A
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2089
PS3, Line 2089:   public List<SlotId> getValueTransferTargets(SlotId srcSid) {
> We generally avoid these @param for brevity. I concede they might make sens
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@136
PS3, Line 136:   public boolean localEquals(Expr that) { return true; }
> Missing implementation?
localEquals only checks members defined in this subclass. ArithmeticExprs' node-local information is fn_, which is defined and compared in Expr.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@687
PS3, Line 687:     return Joiner.on(", ").join(strings);
> * Confusing name because we have a BinaryPredicate Expr and it's not clear 
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@700
PS3, Line 700:    * 3. For every pair of corresponding non-SlotRef exprs, localEquals returns true.
> I think similar() is confusing, how about matches() or equivalent()
Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@720
PS3, Line 720:     if (fn_ == null || that.fn_ == null) return false; // One null, one not
> It returns whether this expr is equal to that ignoring children.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@721
PS3, Line 721:     // Both fn_'s are not null
> Fix comment: Only one expr given to this function
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@723
PS3, Line 723:   }
> protected? Doesn't seem like a good public interface to force callers to ch
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@728
PS3, Line 728:    * Caller should guarantee that 'this' and 'that' have the same type.
> public? Move to the other predicates above. Also LOCAL_EQ_PREDICATE
Changed to SlotrefComparator and moved into SlotRef.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@784
PS3, Line 784:     if (id_ == null) {
> update comment, should this function live in Analyzer instead?
Comment is updated and it's moved to Analyzer.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
File fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@121
PS3, Line 121:     if (v == 0 && 1.0 / v == Double.NEGATIVE_INFINITY) return false;
> Let's keep this patch focused on minimize unrelated changes like this one.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@182
PS3, Line 182:       if (type_ == null) {
> Let's keep this patch focused on minimize unrelated changes like this one.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@522
PS3, Line 522:     public boolean isCompatible(AnalyticExpr analyticExpr) {
> Please revert. This reformatted condition is pretty much incomprehensible t
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399
PS3, Line 399:     boolean partitionedByRight = node.getJoinOp() == RIGHT_OUTER_JOIN ||
> I don't understand this change. What does the join op have to do with how t
This is the output partitioning of this fragment. If it is a right outer join then the output is partitioned by the rhs join exprs, not lhs because the nulls in lhs can be in any partition. The partitioning compatibility check doesn't use equivalence class anymore and is wrong without this.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1377
PS3, Line 1377:       TableRef rhsTblRef = analyzer.getTableRef(rhsId);
> simplify with analyzer.getTupleId(lhsSid)
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@31
PS3, Line 31: 
> The flow of calls/code is somewhat hard to follow for our specific use case
Specialized most of them except for reflexiveTransitiveClosure(). It is called with different types for validation purpose.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@40
PS3, Line 40:       sb.append(i).append(" => ");
> mention that BFS is implemented iteratively to avoid unbounded stack usage
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@43
PS3, Line 43:       }
> This can be called on any type of Graph but we don't test that it actually 
It is called on RandomAccessibleGraph and WritableGraph. The former is called in condensedReflexiveTransitiveClosure and the latter is called in Analyzer.computeValueTransferGraph to generate a reference for graph validation. Maybe use a precondition check to exclude SccCondensedGraph?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@49
PS3, Line 49:   /**
> How about moving this out of the loop and preallocating it to numVertices()
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@52
PS3, Line 52:    */
> This line in particular is confusing because dsts() is abstract and it's no
Yes, reflexiveTransitiveClosure.dsts() is expensive. But I think this is more of a problem with the dsts interface. dsts() is not a good API because:
1. SccCondensedGraph.dsts() constructs a new array every time it's called.
2. The return value of SccCondensedGraph.dsts is owned by the caller. On the other hand the return value of RandomAccessibleGraph.dsts() and WritableGraph.dsts() is read only. There isn't a good way to enforce the read-only property.
3. RandomAccessibleGraph just needs an int[] rather than an IntArrayList. But dsts() forces it to use the latter.

I changed dsts() interface to returning an iterator. In this way SccCondensedGraph.dsts() doesn't need to "decompress" the result and the caller cannot modify the data in Graphs. Does this solve this problem?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@53
PS3, Line 53:   public RandomAccessibleGraph reflexiveTransitiveClosure() {
> style: ++j
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@98
PS3, Line 98:     public void addEdge(int srcVid, int dstVid) {
> Duplicate edges are allowed.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@128
PS3, Line 128:       int idx = Arrays.binarySearch(adjList_[srcVid], dstVid);
> lists
I just checked Wikipedia and a lecture note. An adjacency list is a list of lists. According to this, "sorted adjacency list" might be vague. I added some comments there.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@132
PS3, Line 132: 
> Construct from a list of sorted adjacency lists.
Reworded a little. But an adjacency list is a list of lists.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@150
PS3, Line 150:     }
> for the typical -> because the typical
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@210
PS3, Line 210:      * Get the ID of the strongly connected component containing 'vid'.
> What types of input Graphs are really supported/expected?
This function is merged into condensedReflexiveTransitiveClosure(). See below.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@222
PS3, Line 222:     /**
> This takes a Graph as input. Does it really work on any type of Graph? I do
Changed to WritableGraph.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@223
PS3, Line 223:      * Get an array of vertices IDs in the scc. The caller shouldn't modify the returned
> Flow here is confusing because we create several intermediate SccCondensedG
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@257
PS3, Line 257:         // SCC as vid.
> Imo this doesn't really give the intuition for why the algorithm works.
Changed to a link.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@258
PS3, Line 258:         int unAssignedStackPos;
> lowest -> smallest?
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@268
PS3, Line 268:       int nextDfsIndex = 0;
> What kind of graph is expected here?
WritableGraph. Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@271
PS3, Line 271: u
> vertex ids
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@273
PS3, Line 273:           DfsContext ctx = dfsStack.peek();
> A mapping from vertex id to its DFS preordering index. -1 means not visited
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@282
PS3, Line 282:             // All successors have been searched. Check if this is the root of an SCC.
> * final
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@299
PS3, Line 299:               // Tree edge. DFS this successor. ctx.dstIt is not advanced until the DFS
> int nextDfsIndex = 0;
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@313
PS3, Line 313:               ctx.dstIt.next();
> typo: been
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@359
PS3, Line 359:         while (refIt.hasNext() || condensedIt.hasNext()) {
> What kind of graph is expected here?
WritableGraph. Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java
File fe/src/main/java/org/apache/impala/util/IntArrayList.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@26
PS3, Line 26: public class IntArrayList {
> private, you can add a data() getter
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@29
PS3, Line 29: 
> public
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@33
PS3, Line 33:     data_ = new int[capacity];
> public
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@44
PS3, Line 44:       int[] newData = new int[Math.max(data_.length * 2, capacity)];
> style: remove spaces after and before paranthesis
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@58
PS3, Line 58:   /**
> remove "some"
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@67
PS3, Line 67: 
> single line where it fits; apply everywhere in this file
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@115
PS3, Line 115: 
> Do we really need this? If not, please remove
Removed. It was used to enable Collections.sort() in Graph.validate().


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
File fe/src/test/java/org/apache/impala/util/IntArrayListTest.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java@49
PS3, Line 49:       fail();
> We should be sure to test all public functionality. As far as I can tell th
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Nov 2017 01:36:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 4:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91: /**
> I think If the predicate transfer is the starting point to consider whether
Fair point. I was thinking of a definition like this:

Slot A has a value transfer to slot B if for all rows containing both slots slot B has the same value as slot A or the tuple containing slot B is NULL.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@102
PS4, Line 102:  * Slot value transfer:
Slot value transfers:


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@105
PS4, Line 105:  * Its symmetric closure is a equivalence relation. Its equivalence class is called slot
What does "Its" refer to here?

I think it's easier to state less formally:

Each slot is contained in exactly one equivalence class. A slot equivalence class is a set of slots where each pair of slots has a mutual value transfer. Equivalence classes are assigned an arbitrary id to distinguish them from another.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1527
PS4, Line 1527:       // select * from (select A.a, B.b from A left join B on A.a=B.b) v where b is null
to drive it home even further use "B.col is null" as the predicate to show that the NULL-checking predicate is unrelated to the slots participating in the equivalence


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1648
PS4, Line 1648:     // A map from equivalence class IDs to equivalence classese. The equivalence classes
typo: classes


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1813
PS4, Line 1813:    * Returns a map of slot equivalence classes on the set of slots in given tuples.
the given tuples


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1838
PS4, Line 1838:    * propagation. Each mapping assigns every slot in srcSids to slot in destTid which has
Each mapping assigns every slot in srcSids to a slot in destTid which has a value transfer from srcSid.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1989
PS4, Line 1989:             + "\n" + tc + "Condensed Graph:\n" + condensedTc);
move the first "\n" to previous line


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1998
PS4, Line 1998:     // transform equality predicates into a transfer graph
remove (pretty clear from function comment)


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@691
PS4, Line 691:   interface SlotRefComparator {
Move this into SlotRef since it's SlotRef specific?


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@696
PS4, Line 696:    * Returns if this expr matches 'that'. Two exprs match if:
Returns true if this expr matches 'that'


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@699
PS4, Line 699:    * 2. For every pair of corresponding SlotRef, slotRefCmp.matches returns true.
For every pair of corresponding SlotRefs, slotRefCmp.matches() returns true.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@700
PS4, Line 700:    * 3. For every pair of corresponding non-SlotRef exprs, localEquals returns true.
localEquals()

(we generally use () for function names to make it clear we are referring to a function)


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@719
PS4, Line 719:     if (fn_ == null && that.fn_ == null) return true;
I think we should separate matches() and localEquals() more cleanly. I think localEquals() should be non-abstract and have a default implementation that checks the type and fn_ of this and that. Subclasses can override and call super.localEquals() first.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@727
PS4, Line 727:    * children and fn_.
Not checking fn_ seems a little strange. The function semantics would be cleaner if localEquals() compared this and that without children. Following my above suggestion will also avoid the mysterious "return true" implementations of localEquals() in a few places.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@962
PS4, Line 962:    * Return a new list without duplicate exprs (according to cmp).
according to matches() using 'cmp'


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@198
PS4, Line 198:    * A wrapper around localEquals checking type equality, used for
A wrapper around localEquals().

(no need to explain further, the "checking type equality" is more confusing than helpful)


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399
PS3, Line 399:     boolean partitionedByRight = node.getJoinOp() == RIGHT_OUTER_JOIN ||
> This is the output partitioning of this fragment. If it is a right outer jo
Makes sense, please add a comment to explain.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@52
PS3, Line 52:    */
> Yes, reflexiveTransitiveClosure.dsts() is expensive. But I think this is mo
This addresses the performance issue, but it does not address the testing/readability issue. If we expose a generic API like this on every subclass of Graph then this signals to the reader that it is meaningful to call this function on every subclass and therefore we need to test that all variants actually work.

We "could" add tests that show SccCondensedGraph.reflexiveTransitiveClosure() works correctly. But that is wasted effort and unnecessary code because we never intend/require reflexiveTransitiveClosure() to be called on an SccCondensedGraph. So the bigger issue to me is the "unnecessary abstraction" which adds cognitive burden to readers.

Here's a proposal to solve this issue.
* I believe we need reflexiveTransitiveClosure() on WritableGraph for validate() and and RandomAccessibleGraph for condensedReflexiveTransitiveClosure()
* How about we make reflexiveTransitiveClosure() a member of RandomAccessibleGraph or we make reflexiveTransitiveClosure() accept a RandomAccessibleGraph or something that makes the call specific RandomAccessibleGraph
* To address the validate() use case we can add a simple function that converts a WritableGraph to a RandomAccessibleGraph

This way we don't expose unnecessary generality and our existing tests cover all the code.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@91
PS4, Line 91:     public int numVertices() { return adjList_.length; }
@Override


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@106
PS4, Line 106:     // Adjacency list. Each in list the adjacency list is sorted.
Don't understand the second sentence


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@134
PS4, Line 134:    * An strongly-connected-component(SCC)-condensed graph. Vertices are mapped to their
A graph condensed by its strongly-connected components (SCC). Vertices are mapped to their SCCs and an inner graph on the SCCs is stored.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@152
PS4, Line 152:     public int numVertices() { return sccIds_.length; }
@Override


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@199
PS4, Line 199:     public static SccCondensedGraph condensedReflexiveTransitiveClosure(WritableGraph g) {
Very clear now! Thanks.



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 01:26:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8317

to look at the new patch set (#3).

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................

IMPALA-5976: Remove equivalence class computation in FE

Equivalent class is used to get the equivalencies between slots. It is
ill-defined and the current implementation is inefficient. This patch
removes it and directly uses the information from the value transfer
graph instead.
Value transfer graph is reimplemented using Tarjan's strongly connected
component algorithm and BFS with adjacency lists to speed up on both
condensed and sparse graphs.

Testing: It passes the existing tests. In planner tests the equivalence
between SCC-condensed graph and uncondensed graph is checked. A test
case is added for a helper class IntArrayList. On a query with 1800
union operations, the equivalence class computation time is reduced from
7m57s to 65ms and the planning time is reduced from 8m5s to 13s.

Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/IsNotEmptyPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TimestampArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/util/Graph.java
A fe/src/main/java/org/apache/impala/util/IntArrayList.java
A fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
35 files changed, 1,147 insertions(+), 941 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 3:

(62 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@281
PS3, Line 281:     return new FunctionCallExpr("if", ifParams);
Let's please avoid code changes unrelated to the purpose of this patch as much as reasonable. Cleanup is nice in general, but this patch is already complex and huge so let's not add anything extra.

Such changes can also make backporting more difficult due to conflicts, so cleanup should be done in a separate change.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91:  * Slot A has value transfer to slot B if a predicate on A can be applied to B in most
We need a precise definition. The original definition was more precise.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@93
PS3, Line 93:  * It is a reflexive, transitive binary relation on the set of slots.
Not sure this part adds value. What's the significance of these statements with respect to our use of value transfers for planning purposes? If we don't make use of these terms/properties anywhere else in the code, then we should remove these statements.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@277
PS3, Line 277:     // The SCC-condensed graph representation of value transfer relationship.
The SCC-condensed graph representation of all slot value transfers.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@278
PS3, Line 278:     SccCondensedGraph valueTransferGraph;
public/private?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1146
PS3, Line 1146:    * Create and register an auxiliary predicate to express an mutual value transfer
a mutual value transfer


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1467
PS3, Line 1467:    * by replacing the slots of a source predicate with slots of the destTid, if for each
how about: if every source slot has a value transfer to a slot in destTid


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1513
PS3, Line 1513:       // its referenced tuples are NULL. For example:
Let's simplify the example and make it as clear as possible:

select * from (select A.a, B.b from A left join B on A.a=B.b) v where b is null


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1618
PS3, Line 1618:    * For each slot equivalence class, adds/removes predicates from conjuncts such that it
Need a definition of equivalence class in the Analyzer class comment. You do mention the term "equivalence class" but I don't think it has the same meaning of the "equivalence class" terminology used here.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1636
PS3, Line 1636:     // A map from equivalence class ids to slot equivalence classes containing on slots
Garbled sentence, please clean up


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1956
PS3, Line 1956:    * Compute the value transfer graph based on the registered value transfer relation
the registered value transfers


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1973
PS3, Line 1973:         String condensedTc = globalState_.valueTransferGraph.print();
missing space in string


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1982
PS3, Line 1982:    * Populate the value transfer relation from eq conjuncts to graph 'g'.
Add value-transfer edges to 'g' based on the registered equi-join conjuncts.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2059
PS3, Line 2059:    * Returns the equivalence class of slotID.
of the given slot id


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2062
PS3, Line 2062:   public List<SlotId> getEquivClass(SlotId slot) {
slot -> sid


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2066
PS3, Line 2066:     for (int dst: g.sccMembersByVid(slot.asInt())) {
Unfortunate that we have to do this. Should probably clean this up later by avoiding SlotIds in favor of ints or alternative having a pre-populated SlotId[] so we don't have to create new SlotId objects all over the place.

Let's leave this for now and keep thinking how to address this.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2073
PS3, Line 2073:    * Returns the src slot's value transfer image set. In other words it returns ids of all
Simpler to avoid "image set" and just use your second sentence to describe. In general, introducing a term that needs to be explained in the next sentence is not useful unless the term is really used very frequently.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2087
PS3, Line 2087:    * Get the id of the equivalence class of a slot. An equivalence class ID should be used
Second sentence should be part of the equivalence class definition in the Analyzer class comment


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2089
PS3, Line 2089:    * @param slotId The slot of interest.
We generally avoid these @param for brevity. I concede they might make sense for public functions, but this function is private.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@136
PS3, Line 136:   public boolean localEquals(Expr that) { return true; }
Missing implementation?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@687
PS3, Line 687:   interface ExprBinaryPredicate {
* Confusing name because we have a BinaryPredicate Expr and it's not clear what apply() returns.
* We should make this take two SlotRef arguments and not just any two Exprs. The reason is that we currently have no use for the general Expr case, and so we should make this as specific as possible to make the code easier to follow. In general, we should avoid unnecessary generality for this reason. If we end up needing the generality later we can still refactor (easy change).
* How about calling this SlotRefComparator, seems much clearer to me
* How about calling the function "matches()"


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@700
PS3, Line 700:   public boolean similar(Expr that, ExprBinaryPredicate localPredicate) {
I think similar() is confusing, how about matches() or equivalent()


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@720
PS3, Line 720:    * Local eq comparator. It returns whether two exprs are equal ignoring their children.
It returns whether this expr is equal to that ignoring children.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@721
PS3, Line 721:    * It is guaranteed that the two exprs given to this function share the same type.
Fix comment: Only one expr given to this function


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@723
PS3, Line 723:   abstract public boolean localEquals(Expr that);
protected? Doesn't seem like a good public interface to force callers to check that the types match


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@728
PS3, Line 728:   static final ExprBinaryPredicate localEqPredicate = new ExprBinaryPredicate() {
public? Move to the other predicates above. Also LOCAL_EQ_PREDICATE


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@784
PS3, Line 784:    * Compute the intersection of l1 and l2, given the smap, and
update comment, should this function live in Analyzer instead?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
File fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@121
PS3, Line 121:     return !(v == 0) || !(1.0 / v == Double.NEGATIVE_INFINITY);
Let's keep this patch focused on minimize unrelated changes like this one.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@182
PS3, Line 182:         Double d = value_.doubleValue();
Let's keep this patch focused on minimize unrelated changes like this one.

(Please also apply elsewhere)


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@522
PS3, Line 522:       return !requiresIndependentEval(analyticExprs.get(0)) &&
Please revert. This reformatted condition is pretty much incomprehensible to me.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399
PS3, Line 399:     boolean partitionedByRight = node.getJoinOp() == RIGHT_OUTER_JOIN ||
I don't understand this change. What does the join op have to do with how this fragment is partitioned?

If this change is not required, please revert.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1377
PS3, Line 1377:           if (lhsTblRefIdsHs.contains(analyzer.getSlotDesc(lhsSid).getParent().getId())) {
simplify with analyzer.getTupleId(lhsSid)


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@31
PS3, Line 31: public abstract class Graph {
The flow of calls/code is somewhat hard to follow for our specific use case. It's not obvious who calls what and where the main entry point is. Some of the exposed APIs are "generic" in the sense that they can work with an arbitrary Graph, but in reality we only expect some Graph types to be used. I think we should try to be more specific about the intended/tested use of these classes and APIs.

I'll point out specific examples below.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@40
PS3, Line 40:    * Compute the reflexive transitive closure of the graph by BFS from every vertex.
mention that BFS is implemented iteratively to avoid unbounded stack usage


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@43
PS3, Line 43:   public RandomAccessibleGraph reflexiveTransitiveClosure() {
This can be called on any type of Graph but we don't test that it actually works on all the Graph types. Instead of testing the irrelevant combinations we should move this call to the specific Graph type that we use/test in our code.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@49
PS3, Line 49:       IntArrayList queue = new IntArrayList();
How about moving this out of the loop and preallocating it to numVertices(). This way we can reset() inside the loop and we avoid allocating memory and creating objects inside the loop.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@52
PS3, Line 52:         IntArrayList dstVids = dsts(queue.get(queueFront));
This line in particular is confusing because dsts() is abstract and it's not clear on which Graph subclass this is intended to be called on. We really never want to call reflexiveTransitiveClosure() on a SccCondensedGraph because that would be extremely expensive.

One way to resolve this might be to change this function to take an IntArrayList[] adjtList as an argument and return a new IntArrayList[] which is the tc. I think in general, several places could be cleaned up by operating on a IntArrayList[] instead of a Graph. That way it's clear what callers can pass in, and callers can decide what to do with the returned tc (e.g. create a RandomAccessibleGraph or whatever they want)

Just an idea, I'm certainly open to alternatives.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@53
PS3, Line 53:         for (int j = 0; j < dstVids.size(); j ++) {
style: ++j

Please address this everywhere, there are many instances of this particular style issue. It is easy to miss these while coding. For me, it often helps to review my own code on gerrit to flush out the simple issues before passing on to reviewers.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@98
PS3, Line 98:    * Graph supporting adding edges.
Duplicate edges are allowed.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@128
PS3, Line 128:     // Sorted adjacency list.
lists


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@132
PS3, Line 132:      * Construct from an sorted adjList.
Construct from a list of sorted adjacency lists.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@150
PS3, Line 150:      * instead of a standalone hash table for the typical use case has less than 10,000
for the typical -> because the typical


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@210
PS3, Line 210:     static SccCondensedGraph fromGraph(Graph g) {
What types of input Graphs are really supported/expected?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@222
PS3, Line 222:     public static SccCondensedGraph condensedReflexiveTransitiveClosure(Graph g){
This takes a Graph as input. Does it really work on any type of Graph? I don't think we have tests and I don't think we should have tests. Instead, how about accepting only a WritableGraph.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@223
PS3, Line 223:       SccCondensedGraph condensed = fromGraph(g);
Flow here is confusing because we create several intermediate SccCondensedGraphs. Seems simpler to inline fromGraph() and avoid the unnecessary intermediate SccCondensedGraph.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@257
PS3, Line 257:      * - DFS preordering indexes. It's the ordering in which vertices are visited.
Imo this doesn't really give the intuition for why the algorithm works.

I really like the "stack invariant" and "bookeeping" paragraphs from here:
https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm

Maybe we should just add a link.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@258
PS3, Line 258:      * - Lowlinks. The lowlink of of a vertex is the lowest DFS preordering index
lowest -> smallest?

duplicate "of"


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@268
PS3, Line 268:     static private Pair<int[], int[][]> tarjanScc(final Graph g) {
What kind of graph is expected here?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@271
PS3, Line 271: v
vertex ids


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@273
PS3, Line 273:       // DFS preordering index. -1 indicates unvisited vertex.
A mapping from vertex id to its DFS preordering index. -1 means not visited.

Call this dfsIndexes or dfsIdxs to make the meaning clear


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@282
PS3, Line 282:         private int vid;
* final
* remove "private" from these members? seems confusing


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@299
PS3, Line 299:       int dfsOrdinal = 0;
int nextDfsIndex = 0;


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@313
PS3, Line 313:             // All children have ben searched. Check if this is the root of an SCC.
typo: been


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@359
PS3, Line 359:     static private RandomAccessibleGraph condenseGraphOnScc(Graph g, int[] sccIds,
What kind of graph is expected here?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java
File fe/src/main/java/org/apache/impala/util/IntArrayList.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@26
PS3, Line 26:   int[] data_;
private, you can add a data() getter


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@29
PS3, Line 29:   IntArrayList() {
public


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@33
PS3, Line 33:   IntArrayList(int capacity) {
public


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@44
PS3, Line 44:       System.arraycopy( data_, 0, newData, 0, data_.length );
style: remove spaces after and before paranthesis


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@58
PS3, Line 58:    * Remove some elements from the end the of list.
remove "some"


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@67
PS3, Line 67:   int size() {
single line where it fits; apply everywhere in this file


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@115
PS3, Line 115:   class IntegerIterator implements Iterator<Integer> {
Do we really need this? If not, please remove


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
File fe/src/test/java/org/apache/impala/util/IntArrayListTest.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java@49
PS3, Line 49:   public void testIntArrayList() {
We should be sure to test all public functionality. As far as I can tell these are not tested:

* IntIterator, and IntegerIterator
* set()



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 Nov 2017 22:08:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1497/


-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 9
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Sat, 18 Nov 2017 05:36:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 6:

(26 comments)

Patch is looking good!

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91: /**
> My point is that in "select * from A union B", A.col and B.col doesn't appe
Got it, good point! Now I see where your original formulation is coming from. How about this:

Slot A has a value transfer to slot B if a predicate on A can be applied to B at some point in the plan. Determining the lowest correct placement of that predicate is subject to the conventional assignment rules.


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1987
PS6, Line 1987:         Preconditions.checkState(false, "Condensed transitive closure doesn't equal to " +
throw new IllegalStatsException(<msg>)


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1988
PS6, Line 1988:             "uncondensed transitive closure. Uncondensed graph:\n" + tc +
Graph (capital)


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1989
PS6, Line 1989:             "Condensed Graph:\n" + condensedTc);
I think we need a \n before "Condensed" as well


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2095
PS6, Line 2095:     Collections.sort(result);
comment why we are sorting here, also adjust function comment accordingly


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@692
PS6, Line 692:    * 1. The tree structures are the same and every pair of corresponding Expr nodes have
The tree structures ignoring implicit casts are the same.

(we don't explicitly check the return type, that's really part of condition 3 localEquals())


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@712
PS6, Line 712:     return localEquals(that);
Move this above the loop to detect mismatches faster?


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@716
PS6, Line 716:    * Local eq comparator. It returns whether this expr is equal to 'that' ignoring
Returns true if this expr is equal to 'that' ignoring children.


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@406
PS6, Line 406:       // For right anti and semi joins the lhs join slots does not appear in the output.
Right anti and semi joins do not produce NULLs so should the output partition not be lhsJoinPartition? Like you said, the slots in the rhsJoinPartition are not returned from such joins.


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@413
PS6, Line 413:       // Otherwise it's good to use the lhs partition.
Otherwise we're good to...


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@25
PS6, Line 25: import static java.lang.Math.addExact;
remove (not needed and does not compile on Java7)


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@62
PS6, Line 62:     /** Return an iterator of vertex IDs with an edge from srcVid. */
remove (comment in super class is sufficient)


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@176
PS6, Line 176:         final int[] condensedAdjList = condensed_.adjList_[sccIds_[srcVid]];
members should be private


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@185
PS6, Line 185:             adjListPos += 1;
++adjListPos


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@194
PS6, Line 194:           memberPos += 1;
++memberPos


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@223
PS6, Line 223:       // Step 2: Compute the reflexive transitive closure  . O(V(V+E))
extra spaces before O


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@356
PS6, Line 356:         condensedAdjList[srcSccId] = new int[bs.cardinality()];
same code is here and in L139, factor our into a helper


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java
File fe/src/main/java/org/apache/impala/util/IntArrayList.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java@20
PS6, Line 20: import java.util.Arrays;
not needed


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java@21
PS6, Line 21: import java.util.Iterator;
not needed


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java@88
PS6, Line 88:       int pos_ = 0;
private


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntIterator.java
File fe/src/main/java/org/apache/impala/util/IntIterator.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntIterator.java@28
PS6, Line 28:       int pos_ = 0;
private


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
File fe/src/test/java/org/apache/impala/util/IntArrayListTest.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java@22
PS6, Line 22: /** Unit test for intArrayList. */
remove (pretty clear from the class name)


http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
File testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test:

http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1055
PS6, Line 1055: # The last join in this query is the test subject and the first 2 joins are used to get
The second part needs a clearer explanation, for example,

the first two joins are used to ensure 'a' is made nullable before the fragment executing the full outer join


http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1056
PS6, Line 1056: # around a check in refsNullableTupleId().
* I think the existing refsNullableTupleId() is subtly wrong. It should not matter whether an input fragment has made a tuple nullable already. If the tuple goes through an outer join, then the partition compatibility may not hold.
* I believe your change in createPartitionedHashJoinFragment() is the correct one and makes the refsNullableTupleId() check obsolete.

Hopefully this can help simplify this test.

You can also add another test to test the RANDOM partitioning of full outer joins that does not rely on aggregation (e.g. it tests join partition compatibility instead)


This query shows the incorrectness of the refsNullableTupeId() check:

select /* +straight_join */ t2.id, count(*)
from functional.alltypes t1
left outer join /* +shuffle */ functional.alltypessmall t2
  on t1.int_col = t2.int_col
right outer join /* +shuffle */ functional.alltypestiny t3
  on t2.id = t3.id
group by t2.id

Can you please file a new JIRA to track the refsNullableTupleId() bug? We can chat to see whether it's sensible to fix that in a separate patch or not.


http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1057
PS6, Line 1057: select STRAIGHT_JOIN a.int_col, count(*) from functional.alltypes as a inner join
use /* +straight_join */ and also annotate the joins with /* +shuffle */ to make sure we get partitioned joins


http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1097
PS6, Line 1097: 04:HASH JOIN [INNER JOIN, PARTITIONED]
I understand we need join 05 to make 'a' nullable but what's this join for?



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 08:06:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 9
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Sat, 18 Nov 2017 05:35:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8317

to look at the new patch set (#4).

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................

IMPALA-5976: Remove equivalence class computation in FE

Equivalent class is used to get the equivalencies between slots. It is
ill-defined and the current implementation is inefficient. This patch
removes it and directly uses the information from the value transfer
graph instead.
Value transfer graph is reimplemented using Tarjan's strongly connected
component algorithm and BFS with adjacency lists to speed up on both
condensed and sparse graphs.

Testing: It passes the existing tests. In planner tests the equivalence
between SCC-condensed graph and uncondensed graph is checked. A test
case is added for a helper class IntArrayList. On a query with 1800
union operations, the equivalence class computation time is reduced from
7m57s to 65ms and the planning time is reduced from 8m5s to 13s.

Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/IsNotEmptyPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TimestampArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/util/Graph.java
A fe/src/main/java/org/apache/impala/util/IntArrayList.java
A fe/src/main/java/org/apache/impala/util/IntIterator.java
A fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
36 files changed, 1,049 insertions(+), 755 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 9: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 9
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Sat, 18 Nov 2017 09:07:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 2:

(46 comments)

http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-5976: Remove equivalence class computation in FE
> Remove equivalence classes from FE
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a2890
PS1, Line 2890: 
> This function was nice for debugging. Does your patch add a similar functio
Added a function Graph.print, called when graph validation fails in testing.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1253
PS1, Line 1253:     e.getIds(tids, null);
> This new formatting is extremely hard to read, please reformat for readabil
Reverted to the original.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1496
PS1, Line 1496:       if (allDestSids.isEmpty()) continue;
> Unless we are absolutely convinced that the change is correct, I think we s
It is changed to checking whether src has value transfer to an an outer joined tuple, as discussed. The changed tests are also reverted.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1608
PS1, Line 1608:       }
> Refers to "equivalence class" again which is now an undefined term. If you 
Changed the definition of equivalence class at the top of this file.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1626
PS1, Line 1626:    * to cover the known slot equivalences. This function should be called for join
> What does this mean? Is the Integer key the SCC id?
Yes. Added a comment.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1802
PS1, Line 1802:    * Only contains equivalence classes with more than one member.
> I've seen this check in a couple of places, would be nice if that could be 
The handling is not exactly the same among call sites. For example getValueTransferTargets() returns a list of a single element, while here it is skipped. I think the graph data structure shouldn't be aware that there could be out of range slot ids and should fail on out of range parameters instead of handling them.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1952
PS1, Line 1952:     return false;
> computeValueTransferGraph
Done.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1972
PS1, Line 1972:         String tc = reference.print();
> valueTransfers (plural)
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2047
PS1, Line 2047:           || tblRef.getJoinOp() == JoinOperator.LEFT_ANTI_JOIN
> Comment need updating. The concept of "equivalence class" does not exist an
Equivalence class is now defined at the top of this file.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2055
PS1, Line 2055:   }
> In terms of the g API I think callers should have to care about the existen
Removed the getSccId -> getSccMembers indirection. Do you suggest removing the SCC terminology from the graph interfaces?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2070
PS1, Line 2070:   }
> Shouldn't the destinations already be sorted?
Not in current implementation since CondensedTransitiveClosure.getDests() assembles its result on the fly and the result is not sorted there. I haven't figured out why the call site of this function rely on the ordering. The current state is that removing this sort fails some tests (Not different plan. Some runtime filters disappeared.)


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java@702
PS1, Line 702:     if (this instanceof CastExpr && ((CastExpr)this).isImplicit()) {
> It's very unfortunate that we have to create a new pair object to do a Slot
Done. An interface ExprBinaryPredicate is added.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@1
PS1, Line 1: // Licensed to the Apache Software Foundation (ASF) under one
> Apache header
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@18
PS1, Line 18: package org.apache.impala.util;
> Comment on the ordering of elements (they are unordered correct?)
It was unordered. Since in the new patch binary search is used, it diverges into ordered and unordered types. Comments are added accordingly.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@19
PS1, Line 19: 
> I feel like this should really be int[][] for maximum efficiency. It genera
Implemented util.IntArrayList.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@23
PS1, Line 23: 
> style: ++i
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@36
PS1, Line 36:    */
> Not sure how useful these Preconditions checks are. We'll even get a more m
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@45
PS1, Line 45:     BitSet visited = new BitSet(numVertices());
> Does this really need to be public?
No but should the visibility be as strict as possible?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@46
PS1, Line 46:     for (int srcVid = 0; srcVid < numVertices(); ++srcVid) {
> Don't think this is needed.
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@73
PS1, Line 73:       sb.append(i).append(" => ");
> I'm worried this might blow the stack for huge graphs - this would be a reg
Reimplemented in BFS.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@86
PS1, Line 86:   public static boolean validate(Graph a, Graph b) {
> Any idea how this compares to the previous transitive closure implementatio
In general this is O(V(V+E)) while the old one is O(V^3). The old one might be faster on a really dense graph because of the locality of looping/matrix representation. For sparse graphs this should be faster.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@89
PS1, Line 89:       if (!Ordering.natural().sortedCopy(a.dsts(i)).equals(
> Create outside of loop and clear() within loop to avoid generating objects 
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@107
PS1, Line 107:     }
> remove "belonging" the "its" already implies that
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@109
PS1, Line 109:     @Override
> (a list of original vertex ids)
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@110
PS1, Line 110:     public int numVertices() {
> Can this be an int[][] for the same reasons stated above?
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@114
PS1, Line 114:     @Override
> Does the HashSet really buy us much versus doing a binary search on the sor
The newest patch uses binary search. I did some experiments. At 100,000 scale HastSet<Integer> is 1.6X faster than binary search but uses 13X memory. If neither is satisfying we may consider primitive int hastset as well.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@134
PS1, Line 134:     private RandomAccessibleGraph(IntArrayList[] adjList) {
> Remove these Preconditions checks - they don't seem useful to me.
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@144
PS1, Line 144:     public IntArrayList dsts(int srcVid) {
> Remove
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@156
PS1, Line 156:           dstVid);
> Remove
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@180
PS1, Line 180: 
> Please avoid chaining multiple non-trivial function calls because that make
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@195
PS1, Line 195:       return result;
> ... using Tarjan's algorithm.
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@201
PS1, Line 201:      */
> sccIds
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@204
PS1, Line 204:     }
> Needs a better variable name. Not clear what "index" means; "indexes" is be
It is the order in which vertices are visited. It's called "NUMBER" in Tarjan's original paper. "DFS preordering index" should be more appropriate. The comments are updated and the variable is renamed into "indexes".


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@205
PS1, Line 205: 
> lowLinks
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@209
PS1, Line 209:      */
> Needs a better var name, what is this counting? Is this something like visi
Renamed into dfsOrdinal.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@223
PS1, Line 223:       SccCondensedGraph condensed = fromGraph(g);
> I think we should use vertexId or vid or similar consistently.
All the variable naming is changed to .*vid.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@227
PS1, Line 227: 
> Brief description should also be above L205
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@237
PS1, Line 237:      * Get an array of vertices ids in the scc. The caller shouldn't modify the returned
> int[] sccId
Done.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@239
PS1, Line 239:      * Time complexity: O(1)
> index[vertex] = lowLink[vertex] = indexCounter;
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@243
PS1, Line 243:     }
> A few simple comments here and there would make this code easier to follow.
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@244
PS1, Line 244: 
> I'm worried that recursion might blow the stack and lead to a regression in
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@252
PS1, Line 252:     }
> Brief comment about this case/code, e.g. "Create an SCC from all elements o
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@253
PS1, Line 253: 
> use while()
The for loop no longer exists.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@265
PS1, Line 265:      * @param g The input graph.
> Project the original graph 'g' to a new graph in SCC space.
Done. (The term "project" is removed since it basically means the same thing as "condense").


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@274
PS1, Line 274:       int[] indexes = new int[g.numVertices()];
> Instead of having a separate normalize(), can't you remove duplicates right
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Oct 2017 01:43:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................

IMPALA-5976: Remove equivalence class computation in FE

Equivalent class is used to get the equivalencies between slots. It is
ill-defined and the current implementation is inefficient. This patch
removes it and directly uses the information from the value transfer
graph instead.
Value transfer graph is reimplemented using Tarjan's strongly connected
component algorithm and BFS with adjacency lists to speed up on both
condensed and sparse graphs.

Testing: It passes the existing tests. In planner tests the equivalence
between SCC-condensed graph and uncondensed graph is checked. A test
case is added for a helper class IntArrayList. An outer-join edge case
is added in planner test. On a query with 1800 union operations, the
equivalence class computation time is reduced from 7m57s to 65ms and the
planning time is reduced from 8m5s to 13s.

Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Reviewed-on: http://gerrit.cloudera.org:8080/8317
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/util/Graph.java
A fe/src/main/java/org/apache/impala/util/IntArrayList.java
A fe/src/main/java/org/apache/impala/util/IntIterator.java
A fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
33 files changed, 1,176 insertions(+), 785 deletions(-)

Approvals:
  Alex Behm: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 10
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................

IMPALA-5976: Remove equivalence class computation in FE

Equivalent class is used to get the equivalencies between slots. It is
ill-defined and the current implementation is inefficient. This patch
removes it and directly uses the information from the value transfer
graph instead.
Value transfer graph is reimplemented using Tarjan's strongly connected
component algorithm and BFS with adjacency lists to speed up on both
condensed and sparse graphs.

Testing: It passes the existing tests. In planner tests the equivalence
between SCC-condensed graph and uncondensed graph is checked. A test
case is added for a helper class IntArrayList. An outer-join edge case
is added in planner test. On a query with 1800 union operations, the
equivalence class computation time is reduced from 7m57s to 65ms and the
planning time is reduced from 8m5s to 13s.

Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/util/Graph.java
A fe/src/main/java/org/apache/impala/util/IntArrayList.java
A fe/src/main/java/org/apache/impala/util/IntIterator.java
A fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
33 files changed, 1,176 insertions(+), 785 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/9
-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 9
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................

IMPALA-5976: Remove equivalence class computation in FE

Equivalent class is used to get the equivalencies between slots. It is
ill-defined and the current implementation is inefficient. This patch
removes it and directly uses the information from the value transfer
graph instead.
Value transfer graph is reimplemented using Tarjan's strongly connected
component algorithm and BFS with adjacency lists to speed up on both
condensed and sparse graphs.

Testing: It passes the existing tests. In planner tests the equivalence
between SCC-condensed graph and uncondensed graph is checked. A test
case is added for a helper class IntArrayList. An outer-join edge case
is added in planner test. On a query with 1800 union operations, the
equivalence class computation time is reduced from 7m57s to 65ms and the
planning time is reduced from 8m5s to 13s.

Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/util/Graph.java
A fe/src/main/java/org/apache/impala/util/IntArrayList.java
A fe/src/main/java/org/apache/impala/util/IntIterator.java
A fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
32 files changed, 1,130 insertions(+), 760 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5976: Remove equivalent class computation in FE

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalent class computation in FE
......................................................................


Patch Set 1:

(47 comments)

Very nice.

First wave of comments. I have not exhaustively gone through everything yet since this is a very substantial patch.

http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-5976: Remove equivalent class computation in FE
Remove equivalence classes from FE


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a2017
PS1, Line 2017: 
Very satisfying to see this code deleted!


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a2890
PS1, Line 2890: 
This function was nice for debugging. Does your patch add a similar function that nicely prints the two graphs?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1253
PS1, Line 1253:     return !tids.isEmpty() && (tids.size() > 1 || isOjConjunct(e) || isFullOuterJoined(e)
This new formatting is extremely hard to read, please reformat for readability. The old formatting was much better.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1496
PS1, Line 1496:         for (SlotId dst : getEquivSlots(srcSid)) {
Unless we are absolutely convinced that the change is correct, I think we should try to preserve the semantics of the original code. My understanding is that the new getEquivSlots() returns slots that have a mutual value transfer with srcSid - which is different than what the original code did.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1608
PS1, Line 1608:    * For each equivalence class, adds/removes predicates from conjuncts such that it
Refers to "equivalence class" again which is now an undefined term. If you want to use that term we should define what it means. 

Let's think about how to evolve the terminology. Based on my understanding of your patch it looks like you want to redefine "equivalence class" to be a set of slots where each pair of slots has a mutual value transfer.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1626
PS1, Line 1626:     // Equivalence classes only containing slots belonging to lhsTids.
What does this mean? Is the Integer key the SCC id?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1802
PS1, Line 1802:         if (slotDesc.getId().asInt() >= g.numVertices()) continue;
I've seen this check in a couple of places, would be nice if that could be handled by 'g' instead of all callsites.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1952
PS1, Line 1952:   public void computeValueTransfer() {
computeValueTransferGraph


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1972
PS1, Line 1972:   private void constructValueTransferFromEqPredicates(Graph g) {
valueTransfers (plural)


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2047
PS1, Line 2047:    * Returns the ids of slots that are in the same equivalence class as slotId
Comment need updating. The concept of "equivalence class" does not exist anymore and we should not use that term.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2055
PS1, Line 2055:     for (int dst: g.getSccMembers(g.getSccId(slotId.asInt()))) {
In terms of the g API I think callers should have to care about the existence of SCCs. Can we hide those details?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2070
PS1, Line 2070:     Collections.sort(result);
Shouldn't the destinations already be sorted?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java@702
PS1, Line 702:     if (!localCmp.apply(Pair.create(this, that))) return false;
It's very unfortunate that we have to create a new pair object to do a SlotRef comparison. We should try to avoid that generate an excessive number of objects. Expr.equals() really is called quite frequently and SlotRef is the most common Expr.

Using a Java Comparator is not ideal because we don't provide ordering. Using a Function is not ideal because of object generation. Since this is all our own code we could invent a new more suitable interface that accepts two Exprs or even two SlotRefs directly.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@1
PS1, Line 1: package org.apache.impala.util;
Apache header


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@18
PS1, Line 18:   // Adjacency list of the graph (src -> [dst]). Its size is numVertices().
Comment on the ordering of elements (they are unordered correct?)


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@19
PS1, Line 19:   private final ArrayList<ArrayList<Integer>> adjList_;
I feel like this should really be int[][] for maximum efficiency. It generates fewer objects, consumes less memory, is probably slightly faster, and only slightly more difficult to implement.

The numVertices are known and we can manually reallocate/copy when adding elements to the second dimension. An ArrayList is basically a Object[] so it has object overhead, an additional level of indirection, and some overhead for get() versus [] for arrays. Yes, a lot of that is optimized away by the JIT, but it might really matter in this case.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@23
PS1, Line 23:     for (int i = 0; i < numVertices; i ++) adjList_.add(new ArrayList<Integer>());
style: ++i


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@36
PS1, Line 36:     Preconditions.checkState(src < numVertices());
Not sure how useful these Preconditions checks are. We'll even get a more meaningful exception if we remove them.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@45
PS1, Line 45:   public List<Integer> dests(int src) {
Does this really need to be public?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@46
PS1, Line 46:     Preconditions.checkState(src >= 0 && src < numVertices());
Don't think this is needed.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@73
PS1, Line 73:   private void reachabilityDfs(int vertex, BitSet visited) {
I'm worried this might blow the stack for huge graphs - this would be a regression from previous behavior (query that worked before would now fail to generate a plan). Seems safer to implement the DFS without recursion.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@86
PS1, Line 86:   public Graph getTransitiveClosure() {
Any idea how this compares to the previous transitive closure implementation in terms of speed? I tend to prefer this new version because it is simpler, but I am worried it might be slower.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@89
PS1, Line 89:       BitSet visited = new BitSet(numVertices());
Create outside of loop and clear() within loop to avoid generating objects / allocating bitset buffer.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@107
PS1, Line 107:     // Map an original vertex id to its belonging SCC id.
remove "belonging" the "its" already implies that


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@109
PS1, Line 109:     // Map an SCC id to its members (a list of vertex ids in original graph).
(a list of original vertex ids)


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@110
PS1, Line 110:     private final ArrayList<ArrayList<Integer>> sccMemberList_;
Can this be an int[][] for the same reasons stated above?

Also "sccMembers_" seems sufficient


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@114
PS1, Line 114:     private final ArrayList<HashSet<Integer>> transitiveClosureRA_;
Does the HashSet really buy us much versus doing a binary search on the sorted vertex ids?
The HashSet does come with extra space/object overhead.

Do we ever iterate over this HashSet? We definitely need to avoid that due to getting back elements in a "random" order that changes across JVM versions.

If you convert the adjList to int[][] then you could use Arrays.binarySearch()


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@134
PS1, Line 134:       Preconditions.checkState(src >= 0 && src < numVertices());
Remove these Preconditions checks - they don't seem useful to me.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@144
PS1, Line 144:       Preconditions.checkState(src >= 0 && src < numVertices());
Remove


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@156
PS1, Line 156:     public int getSccId(int vertex) {
Remove


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@180
PS1, Line 180:       Graph transitiveClosure = projectGraphOnScc(g, scc.first).getTransitiveClosure();
Please avoid chaining multiple non-trivial function calls because that makes them easy for readers to miss. I think one step is left out of the comments:

Step 0: SCC
Step 1: Project graph into SCC space
Step 2: Compute transitive closure on projected graph
...


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@195
PS1, Line 195:      * Compute the strongly connected components.
... using Tarjan's algorithm.

It's a really neat algorithm that deserves a brief description here :)


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@201
PS1, Line 201:       int[] sccId = new int[g.numVertices()];
sccIds


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@204
PS1, Line 204:       int[] index = new int[g.numVertices()];
Needs a better variable name. Not clear what "index" means; "indexes" is better but still does not convey what is indexed into. Are these really dfsIndexes? I feel like the concept of this "dfs index" needs to be explained.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@205
PS1, Line 205:       int[] lowLink = new int[g.numVertices()];
lowLinks


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@209
PS1, Line 209:       int indexCounter = 0;
Needs a better var name, what is this counting? Is this something like visitedOrdinal or dfsOrdinal?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@223
PS1, Line 223:      * @param vertex The vertex currently being searched.
I think we should use vertexId or vid or similar consistently.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@227
PS1, Line 227:      * @param lowLink The mapping from a vertex id ('v') to the lowest DFS index of
Brief description should also be above L205


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@237
PS1, Line 237:         BitSet onStack, int sccId[], ArrayList<ArrayList<Integer>> sccMemberList) {
int[] sccId


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@239
PS1, Line 239:       index[vertex] = lowLink[vertex] = indexCounter ++;
index[vertex] = lowLink[vertex] = indexCounter;
++indexCounter;


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@243
PS1, Line 243:         if (index[dst] == -1) {
A few simple comments here and there would make this code easier to follow. For example, there are 3 cases here, would be nice to describe them like this:

Case 1: dstVid has not been visited...
Case 2: dstVid has been visited and is on the stack ...
Case 3: dstVid has been visited and is not on the stack ...


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@244
PS1, Line 244:           indexCounter = tarjanSccDfs(dst, g, indexCounter, index, lowLink, stack,
I'm worried that recursion might blow the stack and lead to a regression in behavior (query that used to work now fails to plan).


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@252
PS1, Line 252:         ArrayList<Integer> sccMembers = new ArrayList<>();
Brief comment about this case/code, e.g. "Create an SCC from all elements on the stack up to 'vertex'.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@253
PS1, Line 253:         for (int stackTop = -1; stackTop != vertex; ) {
use while()


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@265
PS1, Line 265:      * Project the original graph represented by 'adjList' to a new graph on SCC space.
Project the original graph 'g' to a new graph in SCC space.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@274
PS1, Line 274:         for (int dst : g.dests(src)) {
Instead of having a separate normalize(), can't you remove duplicates right here by keeping track of the added sccs in a Bitset?



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Oct 2017 23:52:56 +0000
Gerrit-HasComments: Yes