You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/04/08 02:37:12 UTC

[GitHub] [incubator-doris] jackwener opened a new pull request, #8910: [feature](optimizer) : convert out join to inner join

jackwener opened a new pull request, #8910:
URL: https://github.com/apache/incubator-doris/pull/8910

   # Proposed changes
   
   Issue Number: close #7698
   
   ## Problem Summary:
   
   Add optimizer feature: outer join to inner join when there is not null predicate in joined table.
   
   Such as 
   
   ```sql
   select * 
   from table1 left join table2 
   on table1.siteid = table2.siteid 
   where table2.siteid > 0;
   
   ->
   
   select * 
   from table1 inner join table2 
   on table1.siteid = table2.siteid 
   where table2.siteid > 0;
   ``` 
   
   In this PR, if a table exist a non-null predicate in it, I will convert it join type
   - full outer join -> right outer join
   - left outer join -> inner join 
   
   to its next table:
   - full outer join -> left outer join
   - right outer join -> inner join
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (NO)
   2. Has unit tests been added: (No)
   3. Has document been added or modified: (No)
   4. Does it need to update dependencies: (No)
   5. Are there any changes that cannot be rolled back: (No)
   
   ## Further comments
   
   in #7698. Just talk about two table join 
   
   - full outer -> inner if both sides have such predicates
   - left outer -> inner if the right side has such predicates
   - right outer -> inner if the left side has such predicates
   - full outer -> left outer if only the left side has such predicates
   - full outer -> right outer if only the right side has such predicates
   
   But for multi-table Join, it is worth noticing.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] kpfly commented on pull request #8910: [feature](optimizer) : convert out join to inner join

Posted by GitBox <gi...@apache.org>.
kpfly commented on PR #8910:
URL: https://github.com/apache/incubator-doris/pull/8910#issuecomment-1092390070

   Is there any test cases?
   
   jakevin ***@***.***> 于2022年4月8日周五 10:37写道:
   
   > Proposed changes
   >
   > Issue Number: close #7698
   > <https://github.com/apache/incubator-doris/issues/7698>
   > Problem Summary:
   >
   > Add optimizer feature: outer join to inner join when there is not null
   > predicate in joined table.
   >
   > Such as
   >
   > select * from table1 left join table2 on table1.siteid = table2.siteid where table2.siteid > 0;
   > ->
   > select * from table1 inner join table2 on table1.siteid = table2.siteid where table2.siteid > 0;
   >
   > In this PR, if a table exist a non-null predicate in it, I will convert it
   > join type
   >
   >    - full outer join -> right outer join
   >    - left outer join -> inner join
   >
   > to its next table:
   >
   >    - full outer join -> left outer join
   >    - right outer join -> inner join
   >
   > Checklist(Required)
   >
   >    1. Does it affect the original behavior: (NO)
   >    2. Has unit tests been added: (No)
   >    3. Has document been added or modified: (No)
   >    4. Does it need to update dependencies: (No)
   >    5. Are there any changes that cannot be rolled back: (No)
   >
   > Further comments
   >
   > in #7698 <https://github.com/apache/incubator-doris/issues/7698>. Just
   > talk about two table join
   >
   >    - full outer -> inner if both sides have such predicates
   >    - left outer -> inner if the right side has such predicates
   >    - right outer -> inner if the left side has such predicates
   >    - full outer -> left outer if only the left side has such predicates
   >    - full outer -> right outer if only the right side has such predicates
   >
   > But for multi-table Join, it is worth noticing.
   > ------------------------------
   > You can view, comment on, or merge this pull request online at:
   >
   >   https://github.com/apache/incubator-doris/pull/8910
   > Commit Summary
   >
   >    - 9d41a4d
   >    <https://github.com/apache/incubator-doris/pull/8910/commits/9d41a4df5b76820a248e282bdf4bea585f8c20c5>
   >    [feature](optimizer) : convert out join
   >
   > File Changes
   >
   > (9 files <https://github.com/apache/incubator-doris/pull/8910/files>)
   >
   >    - *M*
   >    fe/fe-core/src/main/java/org/apache/doris/analysis/BetweenPredicate.java
   >    <https://github.com/apache/incubator-doris/pull/8910/files#diff-110c512713eadfe5bed29e16367524d7eab9e9e10ea46f70fe86fd53c9d6cb69>
   >    (34)
   >    - *M*
   >    fe/fe-core/src/main/java/org/apache/doris/analysis/BinaryPredicate.java
   >    <https://github.com/apache/incubator-doris/pull/8910/files#diff-f491835ce20e42c028131456251d8c55595f933c9cff3fa24cae8f49398c2cc8>
   >    (170)
   >    - *M*
   >    fe/fe-core/src/main/java/org/apache/doris/analysis/CompoundPredicate.java
   >    <https://github.com/apache/incubator-doris/pull/8910/files#diff-92f25580529a5e98927ebf1b29383e3d433f7658af1fa93fe79d4c56fc3db25c>
   >    (48)
   >    - *M*
   >    fe/fe-core/src/main/java/org/apache/doris/analysis/ExistsPredicate.java
   >    <https://github.com/apache/incubator-doris/pull/8910/files#diff-0e13d9745f68c92643c607f7b8c98c4eda4aea8f68fcf2bb1816f734cd0b8cfd>
   >    (16)
   >    - *M*
   >    fe/fe-core/src/main/java/org/apache/doris/analysis/InPredicate.java
   >    <https://github.com/apache/incubator-doris/pull/8910/files#diff-fbfd39c0da42d68e23e22ca3b74247273182ba49f2e952e6705667580836289b>
   >    (74)
   >    - *M*
   >    fe/fe-core/src/main/java/org/apache/doris/analysis/IsNullPredicate.java
   >    <https://github.com/apache/incubator-doris/pull/8910/files#diff-0ecc5cd89891416801cc309bd82ebb6eb5d7dfcced16938926e8dfdc073072b5>
   >    (14)
   >    - *M*
   >    fe/fe-core/src/main/java/org/apache/doris/analysis/LikePredicate.java
   >    <https://github.com/apache/incubator-doris/pull/8910/files#diff-583a418186aa7a91fcf5915cd8e19b523d5637058dbafba73020e3a5f2deb961>
   >    (9)
   >    - *M* fe/fe-core/src/main/java/org/apache/doris/analysis/Predicate.java
   >    <https://github.com/apache/incubator-doris/pull/8910/files#diff-9347d4dfc7e1c90455a55580e068d3edaba470d4c7242549d2dd107261b95395>
   >    (33)
   >    - *M*
   >    fe/fe-core/src/main/java/org/apache/doris/analysis/TupleIsNullPredicate.java
   >    <https://github.com/apache/incubator-doris/pull/8910/files#diff-9335aa57ca3fe046573e7cd033907df9f237cde2da9aaed7cfafc78c2640a647>
   >    (47)
   >
   > Patch Links:
   >
   >    - https://github.com/apache/incubator-doris/pull/8910.patch
   >    - https://github.com/apache/incubator-doris/pull/8910.diff
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/incubator-doris/pull/8910>, or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AXXX5T27ZQFRJMRGTVVZNELVD6LXHANCNFSM5S3FKB4Q>
   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morrySnow commented on a diff in pull request #8910: [feature](optimizer) : convert out join to inner join

Posted by GitBox <gi...@apache.org>.
morrySnow commented on code in PR #8910:
URL: https://github.com/apache/incubator-doris/pull/8910#discussion_r851054253


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/SelectStmt.java:
##########
@@ -613,6 +617,102 @@ private void whereClauseRewrite() {
         }
     }
 
+    private void convertOutJoinToInnerJoin(Analyzer analyzer) {
+        // Use HashSet to ensure tableRefs are unique, such as avoid two table A in it;
+        HashSet<String> notNullTables = Sets.newHashSet();
+
+        // TODO[jackwener]: normalize the expr and consider more case
+        // Now, just check all is AND
+        if (PredicateUtils.existOr(whereClause))

Review Comment:
   always add brace for if statement



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/SelectStmt.java:
##########
@@ -613,6 +617,102 @@ private void whereClauseRewrite() {
         }
     }
 
+    private void convertOutJoinToInnerJoin(Analyzer analyzer) {
+        // Use HashSet to ensure tableRefs are unique, such as avoid two table A in it;
+        HashSet<String> notNullTables = Sets.newHashSet();
+
+        // TODO[jackwener]: normalize the expr and consider more case
+        // Now, just check all is AND
+        if (PredicateUtils.existOr(whereClause))
+            return;
+        List<Expr> flatExprs = PredicateUtils.flatAnd(whereClause);
+        if (flatExprs.isEmpty()) {
+            return;
+        }
+
+        // Find tables which is in not-null predicate
+        for (Expr expr : flatExprs) {
+            if (expr instanceof Predicate && ((Predicate) expr).isNotNullPred()) {
+                for (Expr child : expr.getChildren()) {
+                    if (child instanceof SlotRef) {
+                        SlotRef slotRef = (SlotRef) child;
+                        if (slotRef.getTableName() != null) {
+                            notNullTables.add(slotRef.getTableName().getTbl());
+                        }
+                    }
+                }
+            }
+        }
+
+        // For each not null predicate, find the table transfer outer join:
+        // self table:
+        // full outer join -> right outer join
+        // left outer join -> inner join
+        // next table:
+        // full outer join -> left outer join
+        // right outer join -> inner join
+        UnaryOperator<JoinOperator> convertSelf = joinOp -> {
+            if (joinOp.isFullOuterJoin()) {
+                return JoinOperator.RIGHT_OUTER_JOIN;
+            } else if (joinOp.isLeftOuterJoin()) {
+                return JoinOperator.INNER_JOIN;
+            }
+            return joinOp;
+        };
+        UnaryOperator<JoinOperator> convertNext = joinOp -> {
+            if (joinOp.isFullOuterJoin()) {
+                return JoinOperator.LEFT_OUTER_JOIN;
+            } else if (joinOp.isRightOuterJoin()) {
+                return JoinOperator.INNER_JOIN;
+            }
+            return joinOp;
+        };
+
+        for (int i = 0; i < fromClause.size(); i++) {
+            if (fromClause == null)
+                return;
+
+            TableRef tableRef = fromClause.get(i);
+            if (tableRef == null || tableRef.getName() == null)
+                continue;
+            if (notNullTables.contains(tableRef.getName().getTbl())) {
+                tableRef.setJoinOp(convertSelf.apply(tableRef.getJoinOp()));
+
+                if (i < fromClause.size() - 1) {
+                    TableRef nextTableRef = fromClause.get(i + 1);
+                    nextTableRef.setJoinOp(convertNext.apply(nextTableRef.getJoinOp()));
+                }
+            }
+        }
+
+        // Clean outerJoinedTupleIds/fullOuterJoinedTupleIds/fullOuterJoinedConjuncts in
+        // globalState.
+        // Make predicate can be pushed down after conversion.
+        ArrayList<ExprId> conjunctIds = Lists.newArrayList();
+        analyzer.getFullOuterJoinedConjuncts().forEach((id, tableRef) -> {
+            if (tableRef != null && tableRef.joinOp.isRightOuterJoin() || tableRef.joinOp.isInnerJoin()
+                    || (tableRef.leftTblRef != null
+                            && (tableRef.leftTblRef.joinOp.isLeftOuterJoin()
+                                    || tableRef.joinOp.isInnerJoin())))
+                conjunctIds.add(id);
+        });
+        conjunctIds.forEach(analyzer.getFullOuterJoinedConjuncts()::remove);
+
+        Consumer<Map<TupleId, TableRef>> remove = (tupleIds) -> {
+            ArrayList<TupleId> ids = Lists.newArrayList();
+            tupleIds.forEach((id, tableRef) -> {
+                if (tableRef != null && tableRef.joinOp.isRightOuterJoin() || tableRef.joinOp.isInnerJoin()
+                        || (tableRef.leftTblRef != null
+                                && (tableRef.leftTblRef.joinOp.isLeftOuterJoin()
+                                        || tableRef.joinOp.isInnerJoin())))
+                    ids.add(id);
+            });
+            ids.forEach(tupleIds::remove);
+        };
+        remove.accept(analyzer.getFullOuterJoinedTupleIds());
+        remove.accept(analyzer.getOuterJoinedTupleIds());

Review Comment:
   1. Do we need to remove conjuncts from `org.apache.doris.analysis.Analyzer.GlobalState#fullOuterJoinedConjuncts`?
   2. Do we need to remove TupleId from `org.apache.doris.analysis.Analyzer.GlobalState#outerJoinedMaterializedTupleIds`?
   
   @EmmyMiao87 could you help to check out whether it is correct?



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/Analyzer.java:
##########
@@ -1534,6 +1534,18 @@ public boolean isSjConjunct(Expr e) {
         return globalState.sjClauseByConjunct.containsKey(e.getId());
     }
 
+    public Map<ExprId, TableRef> getFullOuterJoinedConjuncts(){

Review Comment:
   nit: add space after right parenthesis



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/PredicateUtils.java:
##########
@@ -52,4 +52,46 @@ private static void splitDisjunctivePredicates(Expr expr, List<Expr> result) {
             result.add(expr);
         }
     }
+
+    /**
+     * Split predicates in conjunctive form recursively, i.e., split the input
+     * expression, if the root node of the expression tree is `and` predicate.
+     *
+     * Some examples:
+     * a and b -> a, b
+     * a and b and c -> a, b, c
+     * (a or b) and (c or d) -> (a or b), (c or d)
+     * (a and b) or c -> (a and b) or c
+     * a -> a
+     */
+    public static List<Expr> flatAnd(Expr expr) {
+        ArrayList<Expr> result = Lists.newArrayList();
+        if (expr == null) {
+            return result;
+        }
+
+        flatAnd(expr, result);
+        return result;
+    }
+
+    private static void flatAnd(Expr expr, List<Expr> result) {
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.AND) {
+            flatAnd(expr.getChild(0), result);
+            flatAnd(expr.getChild(1), result);
+        } else {
+            result.add(expr);
+        }
+    }
+
+    public static boolean existOr(Expr expr) {
+        if (expr == null) {
+            return false;
+        }
+
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.OR) {
+            return true;
+        } else {
+            return existOr(expr.getChild(0)) || existOr(expr.getChild(1));

Review Comment:
   expr always has two children?



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/Predicate.java:
##########
@@ -41,6 +41,10 @@ public boolean isEqJoinConjunct() {
         return isEqJoinConjunct;
     }
 
+    // It's used for judging predicate output not-null value (and column is in a
+    // tableRef)
+    public abstract boolean isNotNullPred();

Review Comment:
   nit: use java doc comment style
   ```java
   /**
    *
    */
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/SelectStmt.java:
##########
@@ -613,6 +617,102 @@ private void whereClauseRewrite() {
         }
     }
 
+    private void convertOutJoinToInnerJoin(Analyzer analyzer) {
+        // Use HashSet to ensure tableRefs are unique, such as avoid two table A in it;
+        HashSet<String> notNullTables = Sets.newHashSet();
+
+        // TODO[jackwener]: normalize the expr and consider more case
+        // Now, just check all is AND
+        if (PredicateUtils.existOr(whereClause))
+            return;
+        List<Expr> flatExprs = PredicateUtils.flatAnd(whereClause);
+        if (flatExprs.isEmpty()) {
+            return;
+        }
+
+        // Find tables which is in not-null predicate
+        for (Expr expr : flatExprs) {
+            if (expr instanceof Predicate && ((Predicate) expr).isNotNullPred()) {
+                for (Expr child : expr.getChildren()) {
+                    if (child instanceof SlotRef) {
+                        SlotRef slotRef = (SlotRef) child;
+                        if (slotRef.getTableName() != null) {
+                            notNullTables.add(slotRef.getTableName().getTbl());

Review Comment:
   is it better to use TupleId instead of table name string ?



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/SelectStmt.java:
##########
@@ -613,6 +617,102 @@ private void whereClauseRewrite() {
         }
     }
 
+    private void convertOutJoinToInnerJoin(Analyzer analyzer) {
+        // Use HashSet to ensure tableRefs are unique, such as avoid two table A in it;
+        HashSet<String> notNullTables = Sets.newHashSet();
+
+        // TODO[jackwener]: normalize the expr and consider more case
+        // Now, just check all is AND
+        if (PredicateUtils.existOr(whereClause))
+            return;
+        List<Expr> flatExprs = PredicateUtils.flatAnd(whereClause);
+        if (flatExprs.isEmpty()) {
+            return;
+        }
+
+        // Find tables which is in not-null predicate
+        for (Expr expr : flatExprs) {
+            if (expr instanceof Predicate && ((Predicate) expr).isNotNullPred()) {
+                for (Expr child : expr.getChildren()) {
+                    if (child instanceof SlotRef) {
+                        SlotRef slotRef = (SlotRef) child;
+                        if (slotRef.getTableName() != null) {
+                            notNullTables.add(slotRef.getTableName().getTbl());
+                        }
+                    }
+                }
+            }
+        }
+
+        // For each not null predicate, find the table transfer outer join:
+        // self table:
+        // full outer join -> right outer join
+        // left outer join -> inner join
+        // next table:
+        // full outer join -> left outer join
+        // right outer join -> inner join
+        UnaryOperator<JoinOperator> convertSelf = joinOp -> {
+            if (joinOp.isFullOuterJoin()) {
+                return JoinOperator.RIGHT_OUTER_JOIN;
+            } else if (joinOp.isLeftOuterJoin()) {
+                return JoinOperator.INNER_JOIN;
+            }
+            return joinOp;
+        };
+        UnaryOperator<JoinOperator> convertNext = joinOp -> {
+            if (joinOp.isFullOuterJoin()) {
+                return JoinOperator.LEFT_OUTER_JOIN;
+            } else if (joinOp.isRightOuterJoin()) {
+                return JoinOperator.INNER_JOIN;
+            }
+            return joinOp;
+        };
+
+        for (int i = 0; i < fromClause.size(); i++) {
+            if (fromClause == null)
+                return;
+
+            TableRef tableRef = fromClause.get(i);
+            if (tableRef == null || tableRef.getName() == null)
+                continue;
+            if (notNullTables.contains(tableRef.getName().getTbl())) {
+                tableRef.setJoinOp(convertSelf.apply(tableRef.getJoinOp()));
+
+                if (i < fromClause.size() - 1) {
+                    TableRef nextTableRef = fromClause.get(i + 1);
+                    nextTableRef.setJoinOp(convertNext.apply(nextTableRef.getJoinOp()));
+                }
+            }
+        }
+
+        // Clean outerJoinedTupleIds/fullOuterJoinedTupleIds/fullOuterJoinedConjuncts in
+        // globalState.
+        // Make predicate can be pushed down after conversion.
+        ArrayList<ExprId> conjunctIds = Lists.newArrayList();
+        analyzer.getFullOuterJoinedConjuncts().forEach((id, tableRef) -> {
+            if (tableRef != null && tableRef.joinOp.isRightOuterJoin() || tableRef.joinOp.isInnerJoin()

Review Comment:
   should add parentheses  after the first AND ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] jackwener closed pull request #8910: [feature](optimizer) : convert out join to inner join

Posted by GitBox <gi...@apache.org>.
jackwener closed pull request #8910: [feature](optimizer) : convert out join to inner join
URL: https://github.com/apache/incubator-doris/pull/8910


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] jackwener commented on pull request #8910: [WIP][feature](optimizer) : convert out join to inner join

Posted by GitBox <gi...@apache.org>.
jackwener commented on PR #8910:
URL: https://github.com/apache/incubator-doris/pull/8910#issuecomment-1096159553

   TODO:
   Now, this rule apply just predicate all is `AND` like `expr1 And expr2 And expr3`.
   
   The future ticklet is normalize `Expr`, we should conjunct all `OR`, like
   
   ```sql
   (A and B) or C -> (A or C) and (B or C)
   (A and B or C) or (D or E)-> (A and B and E) or (C and D) -> (A or C) and (A or D) and (B or C) and (B or D) and (E or C) and (E or D)
   ```
   
   For each (`left` `OR` `right`), `left` and `right` must have same `column`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org