You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/04/08 06:36:03 UTC

[GitHub] [incubator-doris] yangzhg opened a new pull request #3278: Optimzie where cluase when have duplicate ors

yangzhg opened a new pull request #3278: Optimzie where cluase when have duplicate ors
URL: https://github.com/apache/incubator-doris/pull/3278
 
 
   improve performent describe in #3245 
   for example sql
   ```
   select
      avg(ss_quantity),
      avg(ss_ext_sales_price),
      avg(ss_ext_wholesale_cost),
      sum(ss_ext_wholesale_cost)
   from
      store_sales,
      customer_address
   where
      (
         (
            ss_addr_sk = ca_address_sk
            and ca_country = 'United States'
            and ca_state in ('CO', 'IL', 'MN')
            and ss_net_profit between 100
            and 200
         )
         or (
            ss_addr_sk = ca_address_sk
            and ca_country = 'United States'
            and ca_state in ('OH', 'MT', 'NM')
            and ss_net_profit between 150
            and 300
         )
         or (
            ss_addr_sk = ca_address_sk
            and ca_country = 'United States'
            and ca_state in ('TX', 'MO', 'MI')
            and ss_net_profit between 50
            and 250
         )
      )
   ```
   now can be finished in 20s
   ![image](https://user-images.githubusercontent.com/9098473/78752220-422d0480-79a6-11ea-9f30-0c0a7f56491a.png)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors
URL: https://github.com/apache/incubator-doris/pull/3278#discussion_r405583281
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/analysis/SelectStmt.java
 ##########
 @@ -504,6 +509,132 @@ private void whereClauseRewrite() {
         }
     }
 
+    /**
+     * this function only process (a and b and c) or (d and e and f) like clause,
+     * this function will extract this to [[a, b, c], [d, e, f]]
+     */
+    private List<List<Expr>> extractDuplicateOrs(CompoundPredicate expr) {
+        List<List<Expr>> orExprs = new ArrayList<>();
+        for (Expr child : expr.getChildren()) {
+            if (child instanceof CompoundPredicate) {
+                CompoundPredicate childCp = (CompoundPredicate) child;
+                if (childCp.getOp() == CompoundPredicate.Operator.OR) {
+                    orExprs.addAll(extractDuplicateOrs(childCp));
+                    continue;
+                } else if (childCp.getOp() == CompoundPredicate.Operator.AND) {
+                    orExprs.add(flatAndExpr(child));
+                    continue;
+                }
+            }
+            orExprs.add(Arrays.asList(child));
+        }
+        return orExprs;
+    }
+
+    /**
+     * This function attempts to apply the inverse OR distributive law:
+     * ((A AND B) OR (A AND C))  =>  (A AND (B OR C))
+     * That is, locate OR clauses in which every subclause contains an
+     * identical term, and pull out the duplicated terms.
+     */
+    private Expr deduplicateOrs(Expr expr) {
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.OR) {
+            Expr rewritedExpr = processDuplicateOrs(extractDuplicateOrs((CompoundPredicate) expr));
+            if (rewritedExpr != null) {
+                return rewritedExpr;
+            }
+        } else {
+            for (int i = 0; i < expr.getChildren().size(); i++) {
+                Expr rewritedExpr = deduplicateOrs(expr.getChild(i));
+                if (rewritedExpr != null) {
+                    expr.setChild(i, rewritedExpr);
+                }
+            }
+        }
+        return expr;
+    }
+
+    /**
+     * try to flat and , a and b and c => [a, b, c]
+     */
+    private List<Expr> flatAndExpr(Expr expr) {
+        List<Expr> andExprs = new ArrayList<>();
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.AND) {
+            andExprs.addAll(flatAndExpr(expr.getChild(0)));
+            andExprs.addAll(flatAndExpr(expr.getChild(1)));
+        } else {
+            andExprs.add(expr);
+        }
+        return andExprs;
+    }
+
+    /**
+     * the input is a list of list, the inner list is and connected exprs, the outer list is or connected
+     * for example clause (a and b and c) or (a and e and f) after extractDuplicateOrs will be [[a, b, c], [a, e, f]]
+     * this is the input of this function, first step is deduplicate [[a, b, c], [a, e, f]] => [[a], [b, c], [e, f]]
+     * then rebuild the expr to a and ((b and c) or (e and f))
+     */
+    private Expr processDuplicateOrs(List<List<Expr>> exprs) {
+        if (CollectionUtils.isEmpty(exprs) || exprs.size() < 2) {
+            return null;
+        }
+        // 1. remove duplcated elemnents [[a,a], [a, b], [a,b]] => [[a], [a,b]]
 
 Review comment:
   ```suggestion
           // 1. remove duplicated elements [[a,a], [a, b], [a,b]] => [[a], [a,b]]
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors
URL: https://github.com/apache/incubator-doris/pull/3278#discussion_r405584065
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/analysis/SelectStmt.java
 ##########
 @@ -504,6 +509,132 @@ private void whereClauseRewrite() {
         }
     }
 
+    /**
+     * this function only process (a and b and c) or (d and e and f) like clause,
+     * this function will extract this to [[a, b, c], [d, e, f]]
+     */
+    private List<List<Expr>> extractDuplicateOrs(CompoundPredicate expr) {
+        List<List<Expr>> orExprs = new ArrayList<>();
+        for (Expr child : expr.getChildren()) {
+            if (child instanceof CompoundPredicate) {
+                CompoundPredicate childCp = (CompoundPredicate) child;
+                if (childCp.getOp() == CompoundPredicate.Operator.OR) {
+                    orExprs.addAll(extractDuplicateOrs(childCp));
+                    continue;
+                } else if (childCp.getOp() == CompoundPredicate.Operator.AND) {
+                    orExprs.add(flatAndExpr(child));
+                    continue;
+                }
+            }
+            orExprs.add(Arrays.asList(child));
+        }
+        return orExprs;
+    }
+
+    /**
+     * This function attempts to apply the inverse OR distributive law:
+     * ((A AND B) OR (A AND C))  =>  (A AND (B OR C))
+     * That is, locate OR clauses in which every subclause contains an
+     * identical term, and pull out the duplicated terms.
+     */
+    private Expr deduplicateOrs(Expr expr) {
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.OR) {
+            Expr rewritedExpr = processDuplicateOrs(extractDuplicateOrs((CompoundPredicate) expr));
+            if (rewritedExpr != null) {
+                return rewritedExpr;
+            }
+        } else {
+            for (int i = 0; i < expr.getChildren().size(); i++) {
+                Expr rewritedExpr = deduplicateOrs(expr.getChild(i));
+                if (rewritedExpr != null) {
+                    expr.setChild(i, rewritedExpr);
+                }
+            }
+        }
+        return expr;
+    }
+
+    /**
+     * try to flat and , a and b and c => [a, b, c]
+     */
+    private List<Expr> flatAndExpr(Expr expr) {
+        List<Expr> andExprs = new ArrayList<>();
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.AND) {
+            andExprs.addAll(flatAndExpr(expr.getChild(0)));
+            andExprs.addAll(flatAndExpr(expr.getChild(1)));
+        } else {
+            andExprs.add(expr);
+        }
+        return andExprs;
+    }
+
+    /**
+     * the input is a list of list, the inner list is and connected exprs, the outer list is or connected
+     * for example clause (a and b and c) or (a and e and f) after extractDuplicateOrs will be [[a, b, c], [a, e, f]]
+     * this is the input of this function, first step is deduplicate [[a, b, c], [a, e, f]] => [[a], [b, c], [e, f]]
+     * then rebuild the expr to a and ((b and c) or (e and f))
+     */
+    private Expr processDuplicateOrs(List<List<Expr>> exprs) {
+        if (CollectionUtils.isEmpty(exprs) || exprs.size() < 2) {
+            return null;
+        }
+        // 1. remove duplcated elemnents [[a,a], [a, b], [a,b]] => [[a], [a,b]]
+        Set<Set<Expr>> set = new LinkedHashSet<>();
+        for (List<Expr> ex : exprs) {
+            Set<Expr> es = new LinkedHashSet<>();
+            es.addAll(ex);
+            set.add(es);
+        }
+        exprs.clear();
 
 Review comment:
   Better NOT to modify the parameter. It will make the method error-prone

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] kangkaisen commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors

Posted by GitBox <gi...@apache.org>.
kangkaisen commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors
URL: https://github.com/apache/incubator-doris/pull/3278#discussion_r405362817
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/analysis/SelectStmt.java
 ##########
 @@ -504,6 +509,132 @@ private void whereClauseRewrite() {
         }
     }
 
+    /**
+     * this function only process (a and b and c) or (d and e and f) like clause,
+     * this function will extract this to [[a, b, c], [d, e, f]]
+     */
+    private List<List<Expr>> extractDuplicateOrs(CompoundPredicate expr) {
+        List<List<Expr>> orExprs = new ArrayList<>();
+        for (Expr child : expr.getChildren()) {
+            if (child instanceof CompoundPredicate) {
+                CompoundPredicate childCp = (CompoundPredicate) child;
+                if (childCp.getOp() == CompoundPredicate.Operator.OR) {
+                    orExprs.addAll(extractDuplicateOrs(childCp));
+                    continue;
+                } else if (childCp.getOp() == CompoundPredicate.Operator.AND) {
+                    orExprs.add(flatAndExpr(child));
+                    continue;
+                }
+            }
+            orExprs.add(Arrays.asList(child));
+        }
+        return orExprs;
+    }
+
+    /**
+     * This function attempts to apply the inverse OR distributive law:
+     * ((A AND B) OR (A AND C))  =>  (A AND (B OR C))
+     * That is, locate OR clauses in which every subclause contains an
+     * identical term, and pull out the duplicated terms.
+     */
+    private Expr deduplicateOrs(Expr expr) {
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.OR) {
+            Expr rewritedExpr = processDuplicateOrs(extractDuplicateOrs((CompoundPredicate) expr));
+            if (rewritedExpr != null) {
+                return rewritedExpr;
+            }
+        } else {
+            for (int i = 0; i < expr.getChildren().size(); i++) {
+                Expr rewritedExpr = deduplicateOrs(expr.getChild(i));
+                if (rewritedExpr != null) {
+                    expr.setChild(i, rewritedExpr);
+                }
+            }
+        }
+        return expr;
+    }
+
+    /**
+     * try to flat and , a and b and c => [a, b, c]
+     */
+    private List<Expr> flatAndExpr(Expr expr) {
+        List<Expr> andExprs = new ArrayList<>();
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.AND) {
+            andExprs.addAll(flatAndExpr(expr.getChild(0)));
+            andExprs.addAll(flatAndExpr(expr.getChild(1)));
+        } else {
+            andExprs.add(expr);
+        }
+        return andExprs;
+    }
+
+    /**
+     * the input is a list of list, the inner list is and connected exprs, the outer list is or connected
+     * for example clause (a and b and c) or (a and e and f) after extractDuplicateOrs will be [[a, b, c], [a, e, f]]
+     * this is the input of this function, first step is deduplicate [[a, b, c], [a, e, f]] => [[a], [b, c], [e, f]]
+     * then rebuild the expr to a and ((b and c) or (e and f))
+     */
+    private Expr processDuplicateOrs(List<List<Expr>> exprs) {
 
 Review comment:
   Could we move this improve to `ExprRewriteRule` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors
URL: https://github.com/apache/incubator-doris/pull/3278#discussion_r405586193
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/analysis/SelectStmt.java
 ##########
 @@ -504,6 +509,132 @@ private void whereClauseRewrite() {
         }
     }
 
+    /**
+     * this function only process (a and b and c) or (d and e and f) like clause,
+     * this function will extract this to [[a, b, c], [d, e, f]]
+     */
+    private List<List<Expr>> extractDuplicateOrs(CompoundPredicate expr) {
+        List<List<Expr>> orExprs = new ArrayList<>();
+        for (Expr child : expr.getChildren()) {
+            if (child instanceof CompoundPredicate) {
+                CompoundPredicate childCp = (CompoundPredicate) child;
+                if (childCp.getOp() == CompoundPredicate.Operator.OR) {
+                    orExprs.addAll(extractDuplicateOrs(childCp));
+                    continue;
+                } else if (childCp.getOp() == CompoundPredicate.Operator.AND) {
+                    orExprs.add(flatAndExpr(child));
+                    continue;
+                }
+            }
+            orExprs.add(Arrays.asList(child));
+        }
+        return orExprs;
+    }
+
+    /**
+     * This function attempts to apply the inverse OR distributive law:
+     * ((A AND B) OR (A AND C))  =>  (A AND (B OR C))
+     * That is, locate OR clauses in which every subclause contains an
+     * identical term, and pull out the duplicated terms.
+     */
+    private Expr deduplicateOrs(Expr expr) {
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.OR) {
+            Expr rewritedExpr = processDuplicateOrs(extractDuplicateOrs((CompoundPredicate) expr));
+            if (rewritedExpr != null) {
+                return rewritedExpr;
+            }
+        } else {
+            for (int i = 0; i < expr.getChildren().size(); i++) {
+                Expr rewritedExpr = deduplicateOrs(expr.getChild(i));
+                if (rewritedExpr != null) {
+                    expr.setChild(i, rewritedExpr);
+                }
+            }
+        }
+        return expr;
+    }
+
+    /**
+     * try to flat and , a and b and c => [a, b, c]
+     */
+    private List<Expr> flatAndExpr(Expr expr) {
+        List<Expr> andExprs = new ArrayList<>();
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.AND) {
+            andExprs.addAll(flatAndExpr(expr.getChild(0)));
+            andExprs.addAll(flatAndExpr(expr.getChild(1)));
+        } else {
+            andExprs.add(expr);
+        }
+        return andExprs;
+    }
+
+    /**
+     * the input is a list of list, the inner list is and connected exprs, the outer list is or connected
+     * for example clause (a and b and c) or (a and e and f) after extractDuplicateOrs will be [[a, b, c], [a, e, f]]
+     * this is the input of this function, first step is deduplicate [[a, b, c], [a, e, f]] => [[a], [b, c], [e, f]]
+     * then rebuild the expr to a and ((b and c) or (e and f))
+     */
+    private Expr processDuplicateOrs(List<List<Expr>> exprs) {
+        if (CollectionUtils.isEmpty(exprs) || exprs.size() < 2) {
+            return null;
+        }
+        // 1. remove duplcated elemnents [[a,a], [a, b], [a,b]] => [[a], [a,b]]
+        Set<Set<Expr>> set = new LinkedHashSet<>();
+        for (List<Expr> ex : exprs) {
+            Set<Expr> es = new LinkedHashSet<>();
+            es.addAll(ex);
+            set.add(es);
+        }
+        exprs.clear();
+        for (Set<Expr> es : set) {
+            List<Expr> el = new ArrayList<>();
+            el.addAll(es);
+            exprs.add(el);
+        }
+        if (exprs.size() == 1) {
+            return makeCompound(exprs.get(0), CompoundPredicate.Operator.AND);
+        }
+        // 2. find duplcate cross the clause
+        List<Expr> cloneExprs = new ArrayList<>(exprs.get(0));
+        for (int i = 1; i < exprs.size(); ++i) {
+            cloneExprs.retainAll(exprs.get(i));
+        }
+        List<Expr> temp = new ArrayList<>();
+        if (CollectionUtils.isNotEmpty(cloneExprs)) {
+            temp.add(makeCompound(cloneExprs, CompoundPredicate.Operator.AND));
+        }
+
+        for (List<Expr> exprList : exprs) {
+            exprList.removeAll(cloneExprs);
+            temp.add(makeCompound(exprList, CompoundPredicate.Operator.AND));
+        }
+
+        // rebuild CompoundPredicate if found duplicate predicate will build (predcate) and (.. or ..)  predicate in
+        // step 1: will build (.. or ..)
+        Expr result = CollectionUtils.isNotEmpty(cloneExprs) ? new CompoundPredicate(CompoundPredicate.Operator.AND,
+                temp.get(0), makeCompound(temp.subList(1, temp.size()), CompoundPredicate.Operator.OR))
+                : makeCompound(temp, CompoundPredicate.Operator.OR);
+        LOG.info("rewrite ors: " + result.toSql());
 
 Review comment:
   ```suggestion
           LOG.debug("rewrite ors: " + result.toSql());
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #3278: Optimzie where cluase when have duplicate ors

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #3278: Optimzie where cluase when have duplicate ors
URL: https://github.com/apache/incubator-doris/pull/3278
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors

Posted by GitBox <gi...@apache.org>.
yangzhg commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors
URL: https://github.com/apache/incubator-doris/pull/3278#discussion_r405412210
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/analysis/SelectStmt.java
 ##########
 @@ -504,6 +509,132 @@ private void whereClauseRewrite() {
         }
     }
 
+    /**
+     * this function only process (a and b and c) or (d and e and f) like clause,
+     * this function will extract this to [[a, b, c], [d, e, f]]
+     */
+    private List<List<Expr>> extractDuplicateOrs(CompoundPredicate expr) {
+        List<List<Expr>> orExprs = new ArrayList<>();
+        for (Expr child : expr.getChildren()) {
+            if (child instanceof CompoundPredicate) {
+                CompoundPredicate childCp = (CompoundPredicate) child;
+                if (childCp.getOp() == CompoundPredicate.Operator.OR) {
+                    orExprs.addAll(extractDuplicateOrs(childCp));
+                    continue;
+                } else if (childCp.getOp() == CompoundPredicate.Operator.AND) {
+                    orExprs.add(flatAndExpr(child));
+                    continue;
+                }
+            }
+            orExprs.add(Arrays.asList(child));
+        }
+        return orExprs;
+    }
+
+    /**
+     * This function attempts to apply the inverse OR distributive law:
+     * ((A AND B) OR (A AND C))  =>  (A AND (B OR C))
+     * That is, locate OR clauses in which every subclause contains an
+     * identical term, and pull out the duplicated terms.
+     */
+    private Expr deduplicateOrs(Expr expr) {
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.OR) {
+            Expr rewritedExpr = processDuplicateOrs(extractDuplicateOrs((CompoundPredicate) expr));
+            if (rewritedExpr != null) {
+                return rewritedExpr;
+            }
+        } else {
+            for (int i = 0; i < expr.getChildren().size(); i++) {
+                Expr rewritedExpr = deduplicateOrs(expr.getChild(i));
+                if (rewritedExpr != null) {
+                    expr.setChild(i, rewritedExpr);
+                }
+            }
+        }
+        return expr;
+    }
+
+    /**
+     * try to flat and , a and b and c => [a, b, c]
+     */
+    private List<Expr> flatAndExpr(Expr expr) {
+        List<Expr> andExprs = new ArrayList<>();
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.AND) {
+            andExprs.addAll(flatAndExpr(expr.getChild(0)));
+            andExprs.addAll(flatAndExpr(expr.getChild(1)));
+        } else {
+            andExprs.add(expr);
+        }
+        return andExprs;
+    }
+
+    /**
+     * the input is a list of list, the inner list is and connected exprs, the outer list is or connected
+     * for example clause (a and b and c) or (a and e and f) after extractDuplicateOrs will be [[a, b, c], [a, e, f]]
+     * this is the input of this function, first step is deduplicate [[a, b, c], [a, e, f]] => [[a], [b, c], [e, f]]
+     * then rebuild the expr to a and ((b and c) or (e and f))
+     */
+    private Expr processDuplicateOrs(List<List<Expr>> exprs) {
 
 Review comment:
   ExprRewriteRule will apply to expr from bottom to up, this will make cannot get the longest or predicates

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors

Posted by GitBox <gi...@apache.org>.
yangzhg commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors
URL: https://github.com/apache/incubator-doris/pull/3278#discussion_r405412210
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/analysis/SelectStmt.java
 ##########
 @@ -504,6 +509,132 @@ private void whereClauseRewrite() {
         }
     }
 
+    /**
+     * this function only process (a and b and c) or (d and e and f) like clause,
+     * this function will extract this to [[a, b, c], [d, e, f]]
+     */
+    private List<List<Expr>> extractDuplicateOrs(CompoundPredicate expr) {
+        List<List<Expr>> orExprs = new ArrayList<>();
+        for (Expr child : expr.getChildren()) {
+            if (child instanceof CompoundPredicate) {
+                CompoundPredicate childCp = (CompoundPredicate) child;
+                if (childCp.getOp() == CompoundPredicate.Operator.OR) {
+                    orExprs.addAll(extractDuplicateOrs(childCp));
+                    continue;
+                } else if (childCp.getOp() == CompoundPredicate.Operator.AND) {
+                    orExprs.add(flatAndExpr(child));
+                    continue;
+                }
+            }
+            orExprs.add(Arrays.asList(child));
+        }
+        return orExprs;
+    }
+
+    /**
+     * This function attempts to apply the inverse OR distributive law:
+     * ((A AND B) OR (A AND C))  =>  (A AND (B OR C))
+     * That is, locate OR clauses in which every subclause contains an
+     * identical term, and pull out the duplicated terms.
+     */
+    private Expr deduplicateOrs(Expr expr) {
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.OR) {
+            Expr rewritedExpr = processDuplicateOrs(extractDuplicateOrs((CompoundPredicate) expr));
+            if (rewritedExpr != null) {
+                return rewritedExpr;
+            }
+        } else {
+            for (int i = 0; i < expr.getChildren().size(); i++) {
+                Expr rewritedExpr = deduplicateOrs(expr.getChild(i));
+                if (rewritedExpr != null) {
+                    expr.setChild(i, rewritedExpr);
+                }
+            }
+        }
+        return expr;
+    }
+
+    /**
+     * try to flat and , a and b and c => [a, b, c]
+     */
+    private List<Expr> flatAndExpr(Expr expr) {
+        List<Expr> andExprs = new ArrayList<>();
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.AND) {
+            andExprs.addAll(flatAndExpr(expr.getChild(0)));
+            andExprs.addAll(flatAndExpr(expr.getChild(1)));
+        } else {
+            andExprs.add(expr);
+        }
+        return andExprs;
+    }
+
+    /**
+     * the input is a list of list, the inner list is and connected exprs, the outer list is or connected
+     * for example clause (a and b and c) or (a and e and f) after extractDuplicateOrs will be [[a, b, c], [a, e, f]]
+     * this is the input of this function, first step is deduplicate [[a, b, c], [a, e, f]] => [[a], [b, c], [e, f]]
+     * then rebuild the expr to a and ((b and c) or (e and f))
+     */
+    private Expr processDuplicateOrs(List<List<Expr>> exprs) {
 
 Review comment:
   ExprRewriteRule will apply to expr bottom-up, this will make cannot get the longest or predicates, this rewrite needed  top-down

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #3278: Optimzie where cluase when have duplicate ors
URL: https://github.com/apache/incubator-doris/pull/3278#discussion_r405582439
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/analysis/SelectStmt.java
 ##########
 @@ -504,6 +509,132 @@ private void whereClauseRewrite() {
         }
     }
 
+    /**
+     * this function only process (a and b and c) or (d and e and f) like clause,
+     * this function will extract this to [[a, b, c], [d, e, f]]
+     */
+    private List<List<Expr>> extractDuplicateOrs(CompoundPredicate expr) {
+        List<List<Expr>> orExprs = new ArrayList<>();
+        for (Expr child : expr.getChildren()) {
+            if (child instanceof CompoundPredicate) {
+                CompoundPredicate childCp = (CompoundPredicate) child;
+                if (childCp.getOp() == CompoundPredicate.Operator.OR) {
+                    orExprs.addAll(extractDuplicateOrs(childCp));
+                    continue;
+                } else if (childCp.getOp() == CompoundPredicate.Operator.AND) {
+                    orExprs.add(flatAndExpr(child));
+                    continue;
+                }
+            }
+            orExprs.add(Arrays.asList(child));
+        }
+        return orExprs;
+    }
+
+    /**
+     * This function attempts to apply the inverse OR distributive law:
+     * ((A AND B) OR (A AND C))  =>  (A AND (B OR C))
+     * That is, locate OR clauses in which every subclause contains an
+     * identical term, and pull out the duplicated terms.
+     */
+    private Expr deduplicateOrs(Expr expr) {
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.OR) {
+            Expr rewritedExpr = processDuplicateOrs(extractDuplicateOrs((CompoundPredicate) expr));
+            if (rewritedExpr != null) {
+                return rewritedExpr;
+            }
+        } else {
+            for (int i = 0; i < expr.getChildren().size(); i++) {
+                Expr rewritedExpr = deduplicateOrs(expr.getChild(i));
+                if (rewritedExpr != null) {
+                    expr.setChild(i, rewritedExpr);
+                }
+            }
+        }
+        return expr;
+    }
+
+    /**
+     * try to flat and , a and b and c => [a, b, c]
+     */
+    private List<Expr> flatAndExpr(Expr expr) {
+        List<Expr> andExprs = new ArrayList<>();
+        if (expr instanceof CompoundPredicate && ((CompoundPredicate) expr).getOp() == CompoundPredicate.Operator.AND) {
+            andExprs.addAll(flatAndExpr(expr.getChild(0)));
+            andExprs.addAll(flatAndExpr(expr.getChild(1)));
+        } else {
+            andExprs.add(expr);
+        }
+        return andExprs;
+    }
+
+    /**
+     * the input is a list of list, the inner list is and connected exprs, the outer list is or connected
+     * for example clause (a and b and c) or (a and e and f) after extractDuplicateOrs will be [[a, b, c], [a, e, f]]
+     * this is the input of this function, first step is deduplicate [[a, b, c], [a, e, f]] => [[a], [b, c], [e, f]]
+     * then rebuild the expr to a and ((b and c) or (e and f))
+     */
+    private Expr processDuplicateOrs(List<List<Expr>> exprs) {
+        if (CollectionUtils.isEmpty(exprs) || exprs.size() < 2) {
 
 Review comment:
   Only `exprs.size() < 2` is enough.
   
   `exprs` will not be NULL here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org