You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/10/14 07:27:34 UTC

[GitHub] [doris] englefly opened a new pull request, #13375: Estimate cost

englefly opened a new pull request, #13375:
URL: https://github.com/apache/doris/pull/13375

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13375: Estimate cost

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1287074463

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 42.28 seconds
    load time: 591 seconds
    storage size: 17154821287 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221021145135_clickbench_pr_32563.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003977227


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -69,6 +69,29 @@ private static class JoinEstimationResult {
         public double rowCount = 0;
     }
 
+    private static double estimateInnerJoin2(Join join, EqualTo equalto,

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13375: Estimate cost

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1287687641

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 39.58 seconds
    load time: 566 seconds
    storage size: 17154644791 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221022170133_clickbench_pr_32713.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
morrySnow commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1001644423


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostCalculator.java:
##########
@@ -47,6 +48,11 @@
  * Inspired by Presto.
  */
 public class CostCalculator {
+    static final double cpuWeight = 1;

Review Comment:
   ```suggestion
       static final double CPU_WEIGHT = 1;
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/memo/GroupExpression.java:
##########
@@ -39,6 +40,8 @@
  * Representation for group expression in cascades optimizer.
  */
 public class GroupExpression {
+    private double cost = 0.0;

Review Comment:
   is this lowest cost for all PhysicalProperties?



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/properties/PhysicalProperties.java:
##########
@@ -96,4 +96,10 @@ public int hashCode() {
         }
         return hashCode;
     }
+
+    @Override
+    public String toString() {
+        return distributionSpec.toString() + " " + orderSpec.toString();

Review Comment:
   it is better has the same format with plan toString



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/memo/GroupExpression.java:
##########
@@ -39,6 +40,8 @@
  * Representation for group expression in cascades optimizer.
  */
 public class GroupExpression {
+    private double cost = 0.0;
+    private CostEstimate costEstimate = null;

Review Comment:
   what's this use for?



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -69,6 +69,29 @@ private static class JoinEstimationResult {
         public double rowCount = 0;
     }
 
+    private static double estimateInnerJoin2(Join join, EqualTo equalto,

Review Comment:
   ```suggestion
       private static double estimateInnerJoinV2(Join join, EqualTo equalto,
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostCalculator.java:
##########
@@ -47,6 +48,11 @@
  * Inspired by Presto.
  */
 public class CostCalculator {
+    static final double cpuWeight = 1;
+    static final double memorWeight = 1;
+    static final double networkWeight = 1.5;
+    static final double penaltyWeight = 0.5;
+    static final double heavyOperatorPunishFactor = 6.0;

Review Comment:
   plz add some comment to explain these two factors



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculatorV2.java:
##########
@@ -99,7 +99,10 @@ public static void estimate(GroupExpression groupExpression) {
 
     private void estimate() {
         StatsDeriveResult stats = groupExpression.getPlan().accept(this, null);
-        groupExpression.getOwnerGroup().setStatistics(stats);
+        StatsDeriveResult originStats = groupExpression.getOwnerGroup().getStatistics();
+        if (originStats == null || originStats.getRowCount() > stats.getRowCount()) {

Review Comment:
   ditto



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculator.java:
##########
@@ -104,7 +104,10 @@ public static void estimate(GroupExpression groupExpression) {
 
     private void estimate() {
         StatsDeriveResult stats = groupExpression.getPlan().accept(this, null);
-        groupExpression.getOwnerGroup().setStatistics(stats);
+        if (groupExpression.getOwnerGroup().getStatistics() == null
+                || (stats.getRowCount() < groupExpression.getOwnerGroup().getStatistics().getRowCount())) {

Review Comment:
   add comment to explain `stats.getRowCount() < groupExpression.getOwnerGroup().getStatistics().getRowCount())`



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/literal/Literal.java:
##########
@@ -82,6 +82,10 @@ public static Literal of(Object value) {
 
     public abstract Object getValue();
 
+    public double getDouble() {

Review Comment:
   add java doc to this function to explain what it is use for



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/FilterEstimation.java:
##########
@@ -115,19 +105,166 @@ public StatsDeriveResult visitCompoundPredicate(CompoundPredicate predicate, Est
 
     @Override
     public StatsDeriveResult visitComparisonPredicate(ComparisonPredicate cp, EstimationContext context) {
+        boolean isNot = (context != null) && context.isNot;
         Expression left = cp.left();
         Expression right = cp.right();
-        ColumnStat statsForLeft = ExpressionEstimation.estimate(left, stats);
-        ColumnStat statsForRight = ExpressionEstimation.estimate(right, stats);
+        ColumnStat statsForLeft = ExpressionEstimation.estimate(left, inputStats);
+        ColumnStat statsForRight = ExpressionEstimation.estimate(right, inputStats);
 
         double selectivity;
         if (!(left instanceof Literal) && !(right instanceof Literal)) {
             selectivity = calculateWhenBothChildIsColumn(cp, statsForLeft, statsForRight);
         } else {
             // For literal, it's max min is same value.
-            selectivity = calculateWhenRightChildIsLiteral(cp, statsForLeft, statsForRight.getMaxValue());
+            selectivity = updateLeftStatsWhenRightChildIsLiteral(cp,
+                    statsForLeft,
+                    statsForRight.getMaxValue(),
+                    isNot);
+        }
+        StatsDeriveResult outputStats = new StatsDeriveResult(inputStats);
+        //TODO: we take the assumption that func(A) and A have the same stats.
+        outputStats.updateBySelectivity(selectivity, cp.getInputSlots());
+        if (left.getInputSlots().size() == 1) {
+            Slot leftSlot = left.getInputSlots().iterator().next();
+            outputStats.updateColumnStatsForSlot(leftSlot, statsForLeft);
+        }
+        return outputStats;
+    }
+
+    private double updateLessThan(ColumnStat statsForLeft, double val,

Review Comment:
   do we have enough ut to cover these code?



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -124,11 +147,56 @@ private static JoinEstimationResult estimateInnerJoin(PhysicalHashJoin join, Equ
         return result;
     }
 
+    /**
+     * estimate join
+     */
+    public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+        JoinType joinType = join.getJoinType();
+        double rowCount = Double.MAX_VALUE;
+        if (joinType == JoinType.LEFT_SEMI_JOIN || joinType == JoinType.LEFT_ANTI_JOIN) {
+            rowCount = leftStats.getRowCount();
+        } else if (joinType == JoinType.RIGHT_SEMI_JOIN || joinType == JoinType.RIGHT_ANTI_JOIN) {
+            rowCount = rightStats.getRowCount();

Review Comment:
   why semi and anti join filter non rows?



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/physical/PhysicalOlapScan.java:
##########
@@ -105,7 +105,8 @@ public PreAggStatus getPreAggStatus() {
     public String toString() {
         return Utils.toSqlString("PhysicalOlapScan",
                 "qualified", Utils.qualifiedName(qualifier, olapTable.getName()),
-                "output", getOutput()
+                "output", getOutput(),
+                "stats=", statsDeriveResult

Review Comment:
   ```suggestion
                   "stats", statsDeriveResult
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/ExpressionEstimation.java:
##########
@@ -56,12 +62,20 @@ public ColumnStat visit(Expression expr, StatsDeriveResult context) {
         return expr.accept(this, context);
     }
 
+    public ColumnStat visitCaseWhen(CaseWhen caseWhen, StatsDeriveResult context) {
+        throw new RuntimeException("ExpressionEstimation case-when not implemented");
+    }
+
+    public ColumnStat visitCast(Cast cast, StatsDeriveResult context) {
+        return cast.child().accept(this, context);
+    }
+
     @Override
     public ColumnStat visitLiteral(Literal literal, StatsDeriveResult context) {
-        if (literal.isStringLiteral()) {
+        if (ColumnStat.MAX_MIN_UNSUPPORTED_TYPE.contains(literal.getDataType().toCatalogDataType())) {
             return ColumnStat.UNKNOWN;

Review Comment:
   should we return a default value instead of unknown?



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -124,11 +147,56 @@ private static JoinEstimationResult estimateInnerJoin(PhysicalHashJoin join, Equ
         return result;
     }
 
+    /**
+     * estimate join
+     */
+    public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+        JoinType joinType = join.getJoinType();
+        double rowCount = Double.MAX_VALUE;
+        if (joinType == JoinType.LEFT_SEMI_JOIN || joinType == JoinType.LEFT_ANTI_JOIN) {
+            rowCount = leftStats.getRowCount();
+        } else if (joinType == JoinType.RIGHT_SEMI_JOIN || joinType == JoinType.RIGHT_ANTI_JOIN) {
+            rowCount = rightStats.getRowCount();
+        } else if (joinType == JoinType.INNER_JOIN) {
+            if (join.getHashJoinConjuncts().isEmpty()) {
+                //TODO: consider other join conjuncts
+                rowCount = leftStats.getRowCount() * rightStats.getRowCount();
+            } else {
+                for (Expression joinConjunct : join.getHashJoinConjuncts()) {
+                    double tmpRowCount = estimateInnerJoin2(join,
+                            (EqualTo) joinConjunct, leftStats, rightStats);
+                    rowCount = Math.min(rowCount, tmpRowCount);
+                }
+            }
+        } else if (joinType == JoinType.LEFT_OUTER_JOIN) {
+            rowCount = leftStats.getRowCount();
+        } else if (joinType == JoinType.RIGHT_OUTER_JOIN) {
+            rowCount = rightStats.getRowCount();
+        } else if (joinType == JoinType.CROSS_JOIN) {
+            rowCount = CheckedMath.checkedMultiply(leftStats.getRowCount(),
+                    rightStats.getRowCount());
+        } else {
+            throw new RuntimeException("joinType is not supported");
+        }
+
+        StatsDeriveResult statsDeriveResult = new StatsDeriveResult(rowCount, Maps.newHashMap());
+        if (joinType.isRemainLeftJoin()) {
+            statsDeriveResult.merge(leftStats);
+        }
+        if (joinType.isRemainRightJoin()) {
+            statsDeriveResult.merge(rightStats);
+        }
+        statsDeriveResult.setRowCount(rowCount);
+        statsDeriveResult.setWidth(rightStats.getWidth() + leftStats.getWidth());
+        statsDeriveResult.setPenalty(0.0);
+        return statsDeriveResult;
+    }
+
     /**
      * Do estimate.
      * // TODO: since we have no column stats here. just use a fix ratio to compute the row count.
      */
-    public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+    public static StatsDeriveResult estimate2(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {

Review Comment:
   ```suggestion
       public static StatsDeriveResult estimateV2(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/ExpressionEstimation.java:
##########
@@ -56,12 +62,20 @@ public ColumnStat visit(Expression expr, StatsDeriveResult context) {
         return expr.accept(this, context);
     }
 
+    public ColumnStat visitCaseWhen(CaseWhen caseWhen, StatsDeriveResult context) {
+        throw new RuntimeException("ExpressionEstimation case-when not implemented");

Review Comment:
   remove exception, use a default selectivity and add a todo



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1002866177


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/ExpressionEstimation.java:
##########
@@ -56,12 +62,20 @@ public ColumnStat visit(Expression expr, StatsDeriveResult context) {
         return expr.accept(this, context);
     }
 
+    public ColumnStat visitCaseWhen(CaseWhen caseWhen, StatsDeriveResult context) {
+        throw new RuntimeException("ExpressionEstimation case-when not implemented");
+    }
+
+    public ColumnStat visitCast(Cast cast, StatsDeriveResult context) {
+        return cast.child().accept(this, context);
+    }
+
     @Override
     public ColumnStat visitLiteral(Literal literal, StatsDeriveResult context) {
-        if (literal.isStringLiteral()) {
+        if (ColumnStat.MAX_MIN_UNSUPPORTED_TYPE.contains(literal.getDataType().toCatalogDataType())) {
             return ColumnStat.UNKNOWN;

Review Comment:
   'unknown' is the default value



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003972708


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostCalculator.java:
##########
@@ -47,6 +48,11 @@
  * Inspired by Presto.
  */
 public class CostCalculator {
+    static final double cpuWeight = 1;
+    static final double memorWeight = 1;
+    static final double networkWeight = 1.5;
+    static final double penaltyWeight = 0.5;
+    static final double heavyOperatorPunishFactor = 6.0;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003964187


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/memo/GroupExpression.java:
##########
@@ -39,6 +40,8 @@
  * Representation for group expression in cascades optimizer.
  */
 public class GroupExpression {
+    private double cost = 0.0;

Review Comment:
   No. It is the cost of itself, the enforced cost is not counted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1289986198

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 38.79 seconds
    load time: 563 seconds
    storage size: 17154644849 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221025130527_clickbench_pr_33576.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003962955


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/memo/GroupExpression.java:
##########
@@ -39,6 +40,8 @@
  * Representation for group expression in cascades optimizer.
  */
 public class GroupExpression {
+    private double cost = 0.0;
+    private CostEstimate costEstimate = null;

Review Comment:
   this is used for memo debug



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13375: Estimate cost

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1286759318

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 38.9 seconds
    load time: 693 seconds
    storage size: 17154821214 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221021182331_clickbench_pr_32500.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1287732449

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 38.99 seconds
    load time: 568 seconds
    storage size: 17154655313 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221022181341_clickbench_pr_32735.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13375: Estimate cost

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1287687899

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 40.19 seconds
    load time: 593 seconds
    storage size: 17154810760 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221022090259_clickbench_pr_32717.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003977378


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -124,11 +147,56 @@ private static JoinEstimationResult estimateInnerJoin(PhysicalHashJoin join, Equ
         return result;
     }
 
+    /**
+     * estimate join
+     */
+    public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+        JoinType joinType = join.getJoinType();
+        double rowCount = Double.MAX_VALUE;
+        if (joinType == JoinType.LEFT_SEMI_JOIN || joinType == JoinType.LEFT_ANTI_JOIN) {
+            rowCount = leftStats.getRowCount();
+        } else if (joinType == JoinType.RIGHT_SEMI_JOIN || joinType == JoinType.RIGHT_ANTI_JOIN) {
+            rowCount = rightStats.getRowCount();
+        } else if (joinType == JoinType.INNER_JOIN) {
+            if (join.getHashJoinConjuncts().isEmpty()) {
+                //TODO: consider other join conjuncts
+                rowCount = leftStats.getRowCount() * rightStats.getRowCount();
+            } else {
+                for (Expression joinConjunct : join.getHashJoinConjuncts()) {
+                    double tmpRowCount = estimateInnerJoin2(join,
+                            (EqualTo) joinConjunct, leftStats, rightStats);
+                    rowCount = Math.min(rowCount, tmpRowCount);
+                }
+            }
+        } else if (joinType == JoinType.LEFT_OUTER_JOIN) {
+            rowCount = leftStats.getRowCount();
+        } else if (joinType == JoinType.RIGHT_OUTER_JOIN) {
+            rowCount = rightStats.getRowCount();
+        } else if (joinType == JoinType.CROSS_JOIN) {
+            rowCount = CheckedMath.checkedMultiply(leftStats.getRowCount(),
+                    rightStats.getRowCount());
+        } else {
+            throw new RuntimeException("joinType is not supported");
+        }
+
+        StatsDeriveResult statsDeriveResult = new StatsDeriveResult(rowCount, Maps.newHashMap());
+        if (joinType.isRemainLeftJoin()) {
+            statsDeriveResult.merge(leftStats);
+        }
+        if (joinType.isRemainRightJoin()) {
+            statsDeriveResult.merge(rightStats);
+        }
+        statsDeriveResult.setRowCount(rowCount);
+        statsDeriveResult.setWidth(rightStats.getWidth() + leftStats.getWidth());
+        statsDeriveResult.setPenalty(0.0);
+        return statsDeriveResult;
+    }
+
     /**
      * Do estimate.
      * // TODO: since we have no column stats here. just use a fix ratio to compute the row count.
      */
-    public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+    public static StatsDeriveResult estimate2(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1004043979


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/literal/Literal.java:
##########
@@ -82,6 +82,10 @@ public static Literal of(Object value) {
 
     public abstract Object getValue();
 
+    public double getDouble() {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1004053499


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -124,11 +147,56 @@ private static JoinEstimationResult estimateInnerJoin(PhysicalHashJoin join, Equ
         return result;
     }
 
+    /**
+     * estimate join
+     */
+    public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+        JoinType joinType = join.getJoinType();
+        double rowCount = Double.MAX_VALUE;
+        if (joinType == JoinType.LEFT_SEMI_JOIN || joinType == JoinType.LEFT_ANTI_JOIN) {
+            rowCount = leftStats.getRowCount();
+        } else if (joinType == JoinType.RIGHT_SEMI_JOIN || joinType == JoinType.RIGHT_ANTI_JOIN) {
+            rowCount = rightStats.getRowCount();

Review Comment:
   Previous version, semi and anti filter half of rows. The factor 0.5 should be refined.
   In fact, currently any factor are acceptable, they will not change the best plan.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1287727123

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 39.8 seconds
    load time: 562 seconds
    storage size: 17154644969 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221022101245_clickbench_pr_32731.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003974662


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostCalculator.java:
##########
@@ -47,6 +48,11 @@
  * Inspired by Presto.
  */
 public class CostCalculator {
+    static final double cpuWeight = 1;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1004035628


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculator.java:
##########
@@ -104,7 +104,10 @@ public static void estimate(GroupExpression groupExpression) {
 
     private void estimate() {
         StatsDeriveResult stats = groupExpression.getPlan().accept(this, null);
-        groupExpression.getOwnerGroup().setStatistics(stats);
+        if (groupExpression.getOwnerGroup().getStatistics() == null
+                || (stats.getRowCount() < groupExpression.getOwnerGroup().getStatistics().getRowCount())) {

Review Comment:
   done



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculatorV2.java:
##########
@@ -99,7 +99,10 @@ public static void estimate(GroupExpression groupExpression) {
 
     private void estimate() {
         StatsDeriveResult stats = groupExpression.getPlan().accept(this, null);
-        groupExpression.getOwnerGroup().setStatistics(stats);
+        StatsDeriveResult originStats = groupExpression.getOwnerGroup().getStatistics();
+        if (originStats == null || originStats.getRowCount() > stats.getRowCount()) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1004044540


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/physical/PhysicalOlapScan.java:
##########
@@ -105,7 +105,8 @@ public PreAggStatus getPreAggStatus() {
     public String toString() {
         return Utils.toSqlString("PhysicalOlapScan",
                 "qualified", Utils.qualifiedName(qualifier, olapTable.getName()),
-                "output", getOutput()
+                "output", getOutput(),
+                "stats=", statsDeriveResult

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow merged pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count

Posted by GitBox <gi...@apache.org>.
morrySnow merged PR #13375:
URL: https://github.com/apache/doris/pull/13375


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org