You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Alex Behm (Code Review)" <ge...@cloudera.org> on 2017/11/01 22:08:22 UTC

[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 3:

(62 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@281
PS3, Line 281:     return new FunctionCallExpr("if", ifParams);
Let's please avoid code changes unrelated to the purpose of this patch as much as reasonable. Cleanup is nice in general, but this patch is already complex and huge so let's not add anything extra.

Such changes can also make backporting more difficult due to conflicts, so cleanup should be done in a separate change.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91:  * Slot A has value transfer to slot B if a predicate on A can be applied to B in most
We need a precise definition. The original definition was more precise.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@93
PS3, Line 93:  * It is a reflexive, transitive binary relation on the set of slots.
Not sure this part adds value. What's the significance of these statements with respect to our use of value transfers for planning purposes? If we don't make use of these terms/properties anywhere else in the code, then we should remove these statements.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@277
PS3, Line 277:     // The SCC-condensed graph representation of value transfer relationship.
The SCC-condensed graph representation of all slot value transfers.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@278
PS3, Line 278:     SccCondensedGraph valueTransferGraph;
public/private?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1146
PS3, Line 1146:    * Create and register an auxiliary predicate to express an mutual value transfer
a mutual value transfer


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1467
PS3, Line 1467:    * by replacing the slots of a source predicate with slots of the destTid, if for each
how about: if every source slot has a value transfer to a slot in destTid


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1513
PS3, Line 1513:       // its referenced tuples are NULL. For example:
Let's simplify the example and make it as clear as possible:

select * from (select A.a, B.b from A left join B on A.a=B.b) v where b is null


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1618
PS3, Line 1618:    * For each slot equivalence class, adds/removes predicates from conjuncts such that it
Need a definition of equivalence class in the Analyzer class comment. You do mention the term "equivalence class" but I don't think it has the same meaning of the "equivalence class" terminology used here.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1636
PS3, Line 1636:     // A map from equivalence class ids to slot equivalence classes containing on slots
Garbled sentence, please clean up


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1956
PS3, Line 1956:    * Compute the value transfer graph based on the registered value transfer relation
the registered value transfers


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1973
PS3, Line 1973:         String condensedTc = globalState_.valueTransferGraph.print();
missing space in string


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1982
PS3, Line 1982:    * Populate the value transfer relation from eq conjuncts to graph 'g'.
Add value-transfer edges to 'g' based on the registered equi-join conjuncts.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2059
PS3, Line 2059:    * Returns the equivalence class of slotID.
of the given slot id


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2062
PS3, Line 2062:   public List<SlotId> getEquivClass(SlotId slot) {
slot -> sid


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2066
PS3, Line 2066:     for (int dst: g.sccMembersByVid(slot.asInt())) {
Unfortunate that we have to do this. Should probably clean this up later by avoiding SlotIds in favor of ints or alternative having a pre-populated SlotId[] so we don't have to create new SlotId objects all over the place.

Let's leave this for now and keep thinking how to address this.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2073
PS3, Line 2073:    * Returns the src slot's value transfer image set. In other words it returns ids of all
Simpler to avoid "image set" and just use your second sentence to describe. In general, introducing a term that needs to be explained in the next sentence is not useful unless the term is really used very frequently.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2087
PS3, Line 2087:    * Get the id of the equivalence class of a slot. An equivalence class ID should be used
Second sentence should be part of the equivalence class definition in the Analyzer class comment


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2089
PS3, Line 2089:    * @param slotId The slot of interest.
We generally avoid these @param for brevity. I concede they might make sense for public functions, but this function is private.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@136
PS3, Line 136:   public boolean localEquals(Expr that) { return true; }
Missing implementation?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@687
PS3, Line 687:   interface ExprBinaryPredicate {
* Confusing name because we have a BinaryPredicate Expr and it's not clear what apply() returns.
* We should make this take two SlotRef arguments and not just any two Exprs. The reason is that we currently have no use for the general Expr case, and so we should make this as specific as possible to make the code easier to follow. In general, we should avoid unnecessary generality for this reason. If we end up needing the generality later we can still refactor (easy change).
* How about calling this SlotRefComparator, seems much clearer to me
* How about calling the function "matches()"


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@700
PS3, Line 700:   public boolean similar(Expr that, ExprBinaryPredicate localPredicate) {
I think similar() is confusing, how about matches() or equivalent()


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@720
PS3, Line 720:    * Local eq comparator. It returns whether two exprs are equal ignoring their children.
It returns whether this expr is equal to that ignoring children.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@721
PS3, Line 721:    * It is guaranteed that the two exprs given to this function share the same type.
Fix comment: Only one expr given to this function


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@723
PS3, Line 723:   abstract public boolean localEquals(Expr that);
protected? Doesn't seem like a good public interface to force callers to check that the types match


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@728
PS3, Line 728:   static final ExprBinaryPredicate localEqPredicate = new ExprBinaryPredicate() {
public? Move to the other predicates above. Also LOCAL_EQ_PREDICATE


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@784
PS3, Line 784:    * Compute the intersection of l1 and l2, given the smap, and
update comment, should this function live in Analyzer instead?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
File fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@121
PS3, Line 121:     return !(v == 0) || !(1.0 / v == Double.NEGATIVE_INFINITY);
Let's keep this patch focused on minimize unrelated changes like this one.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@182
PS3, Line 182:         Double d = value_.doubleValue();
Let's keep this patch focused on minimize unrelated changes like this one.

(Please also apply elsewhere)


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@522
PS3, Line 522:       return !requiresIndependentEval(analyticExprs.get(0)) &&
Please revert. This reformatted condition is pretty much incomprehensible to me.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399
PS3, Line 399:     boolean partitionedByRight = node.getJoinOp() == RIGHT_OUTER_JOIN ||
I don't understand this change. What does the join op have to do with how this fragment is partitioned?

If this change is not required, please revert.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1377
PS3, Line 1377:           if (lhsTblRefIdsHs.contains(analyzer.getSlotDesc(lhsSid).getParent().getId())) {
simplify with analyzer.getTupleId(lhsSid)


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@31
PS3, Line 31: public abstract class Graph {
The flow of calls/code is somewhat hard to follow for our specific use case. It's not obvious who calls what and where the main entry point is. Some of the exposed APIs are "generic" in the sense that they can work with an arbitrary Graph, but in reality we only expect some Graph types to be used. I think we should try to be more specific about the intended/tested use of these classes and APIs.

I'll point out specific examples below.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@40
PS3, Line 40:    * Compute the reflexive transitive closure of the graph by BFS from every vertex.
mention that BFS is implemented iteratively to avoid unbounded stack usage


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@43
PS3, Line 43:   public RandomAccessibleGraph reflexiveTransitiveClosure() {
This can be called on any type of Graph but we don't test that it actually works on all the Graph types. Instead of testing the irrelevant combinations we should move this call to the specific Graph type that we use/test in our code.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@49
PS3, Line 49:       IntArrayList queue = new IntArrayList();
How about moving this out of the loop and preallocating it to numVertices(). This way we can reset() inside the loop and we avoid allocating memory and creating objects inside the loop.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@52
PS3, Line 52:         IntArrayList dstVids = dsts(queue.get(queueFront));
This line in particular is confusing because dsts() is abstract and it's not clear on which Graph subclass this is intended to be called on. We really never want to call reflexiveTransitiveClosure() on a SccCondensedGraph because that would be extremely expensive.

One way to resolve this might be to change this function to take an IntArrayList[] adjtList as an argument and return a new IntArrayList[] which is the tc. I think in general, several places could be cleaned up by operating on a IntArrayList[] instead of a Graph. That way it's clear what callers can pass in, and callers can decide what to do with the returned tc (e.g. create a RandomAccessibleGraph or whatever they want)

Just an idea, I'm certainly open to alternatives.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@53
PS3, Line 53:         for (int j = 0; j < dstVids.size(); j ++) {
style: ++j

Please address this everywhere, there are many instances of this particular style issue. It is easy to miss these while coding. For me, it often helps to review my own code on gerrit to flush out the simple issues before passing on to reviewers.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@98
PS3, Line 98:    * Graph supporting adding edges.
Duplicate edges are allowed.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@128
PS3, Line 128:     // Sorted adjacency list.
lists


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@132
PS3, Line 132:      * Construct from an sorted adjList.
Construct from a list of sorted adjacency lists.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@150
PS3, Line 150:      * instead of a standalone hash table for the typical use case has less than 10,000
for the typical -> because the typical


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@210
PS3, Line 210:     static SccCondensedGraph fromGraph(Graph g) {
What types of input Graphs are really supported/expected?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@222
PS3, Line 222:     public static SccCondensedGraph condensedReflexiveTransitiveClosure(Graph g){
This takes a Graph as input. Does it really work on any type of Graph? I don't think we have tests and I don't think we should have tests. Instead, how about accepting only a WritableGraph.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@223
PS3, Line 223:       SccCondensedGraph condensed = fromGraph(g);
Flow here is confusing because we create several intermediate SccCondensedGraphs. Seems simpler to inline fromGraph() and avoid the unnecessary intermediate SccCondensedGraph.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@257
PS3, Line 257:      * - DFS preordering indexes. It's the ordering in which vertices are visited.
Imo this doesn't really give the intuition for why the algorithm works.

I really like the "stack invariant" and "bookeeping" paragraphs from here:
https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm

Maybe we should just add a link.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@258
PS3, Line 258:      * - Lowlinks. The lowlink of of a vertex is the lowest DFS preordering index
lowest -> smallest?

duplicate "of"


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@268
PS3, Line 268:     static private Pair<int[], int[][]> tarjanScc(final Graph g) {
What kind of graph is expected here?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@271
PS3, Line 271: v
vertex ids


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@273
PS3, Line 273:       // DFS preordering index. -1 indicates unvisited vertex.
A mapping from vertex id to its DFS preordering index. -1 means not visited.

Call this dfsIndexes or dfsIdxs to make the meaning clear


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@282
PS3, Line 282:         private int vid;
* final
* remove "private" from these members? seems confusing


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@299
PS3, Line 299:       int dfsOrdinal = 0;
int nextDfsIndex = 0;


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@313
PS3, Line 313:             // All children have ben searched. Check if this is the root of an SCC.
typo: been


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@359
PS3, Line 359:     static private RandomAccessibleGraph condenseGraphOnScc(Graph g, int[] sccIds,
What kind of graph is expected here?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java
File fe/src/main/java/org/apache/impala/util/IntArrayList.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@26
PS3, Line 26:   int[] data_;
private, you can add a data() getter


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@29
PS3, Line 29:   IntArrayList() {
public


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@33
PS3, Line 33:   IntArrayList(int capacity) {
public


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@44
PS3, Line 44:       System.arraycopy( data_, 0, newData, 0, data_.length );
style: remove spaces after and before paranthesis


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@58
PS3, Line 58:    * Remove some elements from the end the of list.
remove "some"


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@67
PS3, Line 67:   int size() {
single line where it fits; apply everywhere in this file


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@115
PS3, Line 115:   class IntegerIterator implements Iterator<Integer> {
Do we really need this? If not, please remove


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
File fe/src/test/java/org/apache/impala/util/IntArrayListTest.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java@49
PS3, Line 49:   public void testIntArrayList() {
We should be sure to test all public functionality. As far as I can tell these are not tested:

* IntIterator, and IntegerIterator
* set()



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 Nov 2017 22:08:22 +0000
Gerrit-HasComments: Yes