You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/10/14 07:27:34 UTC
[GitHub] [doris] englefly opened a new pull request, #13375: Estimate cost
englefly opened a new pull request, #13375:
URL: https://github.com/apache/doris/pull/13375
# Proposed changes
Issue Number: close #xxx
## Problem summary
Describe your changes.
## Checklist(Required)
1. Does it affect the original behavior:
- [ ] Yes
- [ ] No
- [ ] I don't know
2. Has unit tests been added:
- [ ] Yes
- [ ] No
- [ ] No Need
3. Has document been added or modified:
- [ ] Yes
- [ ] No
- [ ] No Need
4. Does it need to update dependencies:
- [ ] Yes
- [ ] No
5. Are there any changes that cannot be rolled back:
- [ ] Yes (If Yes, please explain WHY)
- [ ] No
## Further comments
If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #13375: Estimate cost
Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1287074463
TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 42.28 seconds
load time: 591 seconds
storage size: 17154821287 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221021145135_clickbench_pr_32563.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003977227
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -69,6 +69,29 @@ private static class JoinEstimationResult {
public double rowCount = 0;
}
+ private static double estimateInnerJoin2(Join join, EqualTo equalto,
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #13375: Estimate cost
Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1287687641
TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 39.58 seconds
load time: 566 seconds
storage size: 17154644791 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221022170133_clickbench_pr_32713.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] morrySnow commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
morrySnow commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1001644423
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostCalculator.java:
##########
@@ -47,6 +48,11 @@
* Inspired by Presto.
*/
public class CostCalculator {
+ static final double cpuWeight = 1;
Review Comment:
```suggestion
static final double CPU_WEIGHT = 1;
```
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/memo/GroupExpression.java:
##########
@@ -39,6 +40,8 @@
* Representation for group expression in cascades optimizer.
*/
public class GroupExpression {
+ private double cost = 0.0;
Review Comment:
is this lowest cost for all PhysicalProperties?
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/properties/PhysicalProperties.java:
##########
@@ -96,4 +96,10 @@ public int hashCode() {
}
return hashCode;
}
+
+ @Override
+ public String toString() {
+ return distributionSpec.toString() + " " + orderSpec.toString();
Review Comment:
it is better has the same format with plan toString
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/memo/GroupExpression.java:
##########
@@ -39,6 +40,8 @@
* Representation for group expression in cascades optimizer.
*/
public class GroupExpression {
+ private double cost = 0.0;
+ private CostEstimate costEstimate = null;
Review Comment:
what's this use for?
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -69,6 +69,29 @@ private static class JoinEstimationResult {
public double rowCount = 0;
}
+ private static double estimateInnerJoin2(Join join, EqualTo equalto,
Review Comment:
```suggestion
private static double estimateInnerJoinV2(Join join, EqualTo equalto,
```
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostCalculator.java:
##########
@@ -47,6 +48,11 @@
* Inspired by Presto.
*/
public class CostCalculator {
+ static final double cpuWeight = 1;
+ static final double memorWeight = 1;
+ static final double networkWeight = 1.5;
+ static final double penaltyWeight = 0.5;
+ static final double heavyOperatorPunishFactor = 6.0;
Review Comment:
plz add some comment to explain these two factors
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculatorV2.java:
##########
@@ -99,7 +99,10 @@ public static void estimate(GroupExpression groupExpression) {
private void estimate() {
StatsDeriveResult stats = groupExpression.getPlan().accept(this, null);
- groupExpression.getOwnerGroup().setStatistics(stats);
+ StatsDeriveResult originStats = groupExpression.getOwnerGroup().getStatistics();
+ if (originStats == null || originStats.getRowCount() > stats.getRowCount()) {
Review Comment:
ditto
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculator.java:
##########
@@ -104,7 +104,10 @@ public static void estimate(GroupExpression groupExpression) {
private void estimate() {
StatsDeriveResult stats = groupExpression.getPlan().accept(this, null);
- groupExpression.getOwnerGroup().setStatistics(stats);
+ if (groupExpression.getOwnerGroup().getStatistics() == null
+ || (stats.getRowCount() < groupExpression.getOwnerGroup().getStatistics().getRowCount())) {
Review Comment:
add comment to explain `stats.getRowCount() < groupExpression.getOwnerGroup().getStatistics().getRowCount())`
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/literal/Literal.java:
##########
@@ -82,6 +82,10 @@ public static Literal of(Object value) {
public abstract Object getValue();
+ public double getDouble() {
Review Comment:
add java doc to this function to explain what it is use for
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/FilterEstimation.java:
##########
@@ -115,19 +105,166 @@ public StatsDeriveResult visitCompoundPredicate(CompoundPredicate predicate, Est
@Override
public StatsDeriveResult visitComparisonPredicate(ComparisonPredicate cp, EstimationContext context) {
+ boolean isNot = (context != null) && context.isNot;
Expression left = cp.left();
Expression right = cp.right();
- ColumnStat statsForLeft = ExpressionEstimation.estimate(left, stats);
- ColumnStat statsForRight = ExpressionEstimation.estimate(right, stats);
+ ColumnStat statsForLeft = ExpressionEstimation.estimate(left, inputStats);
+ ColumnStat statsForRight = ExpressionEstimation.estimate(right, inputStats);
double selectivity;
if (!(left instanceof Literal) && !(right instanceof Literal)) {
selectivity = calculateWhenBothChildIsColumn(cp, statsForLeft, statsForRight);
} else {
// For literal, it's max min is same value.
- selectivity = calculateWhenRightChildIsLiteral(cp, statsForLeft, statsForRight.getMaxValue());
+ selectivity = updateLeftStatsWhenRightChildIsLiteral(cp,
+ statsForLeft,
+ statsForRight.getMaxValue(),
+ isNot);
+ }
+ StatsDeriveResult outputStats = new StatsDeriveResult(inputStats);
+ //TODO: we take the assumption that func(A) and A have the same stats.
+ outputStats.updateBySelectivity(selectivity, cp.getInputSlots());
+ if (left.getInputSlots().size() == 1) {
+ Slot leftSlot = left.getInputSlots().iterator().next();
+ outputStats.updateColumnStatsForSlot(leftSlot, statsForLeft);
+ }
+ return outputStats;
+ }
+
+ private double updateLessThan(ColumnStat statsForLeft, double val,
Review Comment:
do we have enough ut to cover these code?
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -124,11 +147,56 @@ private static JoinEstimationResult estimateInnerJoin(PhysicalHashJoin join, Equ
return result;
}
+ /**
+ * estimate join
+ */
+ public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+ JoinType joinType = join.getJoinType();
+ double rowCount = Double.MAX_VALUE;
+ if (joinType == JoinType.LEFT_SEMI_JOIN || joinType == JoinType.LEFT_ANTI_JOIN) {
+ rowCount = leftStats.getRowCount();
+ } else if (joinType == JoinType.RIGHT_SEMI_JOIN || joinType == JoinType.RIGHT_ANTI_JOIN) {
+ rowCount = rightStats.getRowCount();
Review Comment:
why semi and anti join filter non rows?
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/physical/PhysicalOlapScan.java:
##########
@@ -105,7 +105,8 @@ public PreAggStatus getPreAggStatus() {
public String toString() {
return Utils.toSqlString("PhysicalOlapScan",
"qualified", Utils.qualifiedName(qualifier, olapTable.getName()),
- "output", getOutput()
+ "output", getOutput(),
+ "stats=", statsDeriveResult
Review Comment:
```suggestion
"stats", statsDeriveResult
```
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/ExpressionEstimation.java:
##########
@@ -56,12 +62,20 @@ public ColumnStat visit(Expression expr, StatsDeriveResult context) {
return expr.accept(this, context);
}
+ public ColumnStat visitCaseWhen(CaseWhen caseWhen, StatsDeriveResult context) {
+ throw new RuntimeException("ExpressionEstimation case-when not implemented");
+ }
+
+ public ColumnStat visitCast(Cast cast, StatsDeriveResult context) {
+ return cast.child().accept(this, context);
+ }
+
@Override
public ColumnStat visitLiteral(Literal literal, StatsDeriveResult context) {
- if (literal.isStringLiteral()) {
+ if (ColumnStat.MAX_MIN_UNSUPPORTED_TYPE.contains(literal.getDataType().toCatalogDataType())) {
return ColumnStat.UNKNOWN;
Review Comment:
should we return a default value instead of unknown?
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -124,11 +147,56 @@ private static JoinEstimationResult estimateInnerJoin(PhysicalHashJoin join, Equ
return result;
}
+ /**
+ * estimate join
+ */
+ public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+ JoinType joinType = join.getJoinType();
+ double rowCount = Double.MAX_VALUE;
+ if (joinType == JoinType.LEFT_SEMI_JOIN || joinType == JoinType.LEFT_ANTI_JOIN) {
+ rowCount = leftStats.getRowCount();
+ } else if (joinType == JoinType.RIGHT_SEMI_JOIN || joinType == JoinType.RIGHT_ANTI_JOIN) {
+ rowCount = rightStats.getRowCount();
+ } else if (joinType == JoinType.INNER_JOIN) {
+ if (join.getHashJoinConjuncts().isEmpty()) {
+ //TODO: consider other join conjuncts
+ rowCount = leftStats.getRowCount() * rightStats.getRowCount();
+ } else {
+ for (Expression joinConjunct : join.getHashJoinConjuncts()) {
+ double tmpRowCount = estimateInnerJoin2(join,
+ (EqualTo) joinConjunct, leftStats, rightStats);
+ rowCount = Math.min(rowCount, tmpRowCount);
+ }
+ }
+ } else if (joinType == JoinType.LEFT_OUTER_JOIN) {
+ rowCount = leftStats.getRowCount();
+ } else if (joinType == JoinType.RIGHT_OUTER_JOIN) {
+ rowCount = rightStats.getRowCount();
+ } else if (joinType == JoinType.CROSS_JOIN) {
+ rowCount = CheckedMath.checkedMultiply(leftStats.getRowCount(),
+ rightStats.getRowCount());
+ } else {
+ throw new RuntimeException("joinType is not supported");
+ }
+
+ StatsDeriveResult statsDeriveResult = new StatsDeriveResult(rowCount, Maps.newHashMap());
+ if (joinType.isRemainLeftJoin()) {
+ statsDeriveResult.merge(leftStats);
+ }
+ if (joinType.isRemainRightJoin()) {
+ statsDeriveResult.merge(rightStats);
+ }
+ statsDeriveResult.setRowCount(rowCount);
+ statsDeriveResult.setWidth(rightStats.getWidth() + leftStats.getWidth());
+ statsDeriveResult.setPenalty(0.0);
+ return statsDeriveResult;
+ }
+
/**
* Do estimate.
* // TODO: since we have no column stats here. just use a fix ratio to compute the row count.
*/
- public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+ public static StatsDeriveResult estimate2(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
Review Comment:
```suggestion
public static StatsDeriveResult estimateV2(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
```
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/ExpressionEstimation.java:
##########
@@ -56,12 +62,20 @@ public ColumnStat visit(Expression expr, StatsDeriveResult context) {
return expr.accept(this, context);
}
+ public ColumnStat visitCaseWhen(CaseWhen caseWhen, StatsDeriveResult context) {
+ throw new RuntimeException("ExpressionEstimation case-when not implemented");
Review Comment:
remove exception, use a default selectivity and add a todo
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1002866177
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/ExpressionEstimation.java:
##########
@@ -56,12 +62,20 @@ public ColumnStat visit(Expression expr, StatsDeriveResult context) {
return expr.accept(this, context);
}
+ public ColumnStat visitCaseWhen(CaseWhen caseWhen, StatsDeriveResult context) {
+ throw new RuntimeException("ExpressionEstimation case-when not implemented");
+ }
+
+ public ColumnStat visitCast(Cast cast, StatsDeriveResult context) {
+ return cast.child().accept(this, context);
+ }
+
@Override
public ColumnStat visitLiteral(Literal literal, StatsDeriveResult context) {
- if (literal.isStringLiteral()) {
+ if (ColumnStat.MAX_MIN_UNSUPPORTED_TYPE.contains(literal.getDataType().toCatalogDataType())) {
return ColumnStat.UNKNOWN;
Review Comment:
'unknown' is the default value
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003972708
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostCalculator.java:
##########
@@ -47,6 +48,11 @@
* Inspired by Presto.
*/
public class CostCalculator {
+ static final double cpuWeight = 1;
+ static final double memorWeight = 1;
+ static final double networkWeight = 1.5;
+ static final double penaltyWeight = 0.5;
+ static final double heavyOperatorPunishFactor = 6.0;
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003964187
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/memo/GroupExpression.java:
##########
@@ -39,6 +40,8 @@
* Representation for group expression in cascades optimizer.
*/
public class GroupExpression {
+ private double cost = 0.0;
Review Comment:
No. It is the cost of itself, the enforced cost is not counted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1289986198
TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 38.79 seconds
load time: 563 seconds
storage size: 17154644849 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221025130527_clickbench_pr_33576.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003962955
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/memo/GroupExpression.java:
##########
@@ -39,6 +40,8 @@
* Representation for group expression in cascades optimizer.
*/
public class GroupExpression {
+ private double cost = 0.0;
+ private CostEstimate costEstimate = null;
Review Comment:
this is used for memo debug
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #13375: Estimate cost
Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1286759318
TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 38.9 seconds
load time: 693 seconds
storage size: 17154821214 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221021182331_clickbench_pr_32500.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1287732449
TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 38.99 seconds
load time: 568 seconds
storage size: 17154655313 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221022181341_clickbench_pr_32735.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #13375: Estimate cost
Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1287687899
TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 40.19 seconds
load time: 593 seconds
storage size: 17154810760 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221022090259_clickbench_pr_32717.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003977378
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -124,11 +147,56 @@ private static JoinEstimationResult estimateInnerJoin(PhysicalHashJoin join, Equ
return result;
}
+ /**
+ * estimate join
+ */
+ public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+ JoinType joinType = join.getJoinType();
+ double rowCount = Double.MAX_VALUE;
+ if (joinType == JoinType.LEFT_SEMI_JOIN || joinType == JoinType.LEFT_ANTI_JOIN) {
+ rowCount = leftStats.getRowCount();
+ } else if (joinType == JoinType.RIGHT_SEMI_JOIN || joinType == JoinType.RIGHT_ANTI_JOIN) {
+ rowCount = rightStats.getRowCount();
+ } else if (joinType == JoinType.INNER_JOIN) {
+ if (join.getHashJoinConjuncts().isEmpty()) {
+ //TODO: consider other join conjuncts
+ rowCount = leftStats.getRowCount() * rightStats.getRowCount();
+ } else {
+ for (Expression joinConjunct : join.getHashJoinConjuncts()) {
+ double tmpRowCount = estimateInnerJoin2(join,
+ (EqualTo) joinConjunct, leftStats, rightStats);
+ rowCount = Math.min(rowCount, tmpRowCount);
+ }
+ }
+ } else if (joinType == JoinType.LEFT_OUTER_JOIN) {
+ rowCount = leftStats.getRowCount();
+ } else if (joinType == JoinType.RIGHT_OUTER_JOIN) {
+ rowCount = rightStats.getRowCount();
+ } else if (joinType == JoinType.CROSS_JOIN) {
+ rowCount = CheckedMath.checkedMultiply(leftStats.getRowCount(),
+ rightStats.getRowCount());
+ } else {
+ throw new RuntimeException("joinType is not supported");
+ }
+
+ StatsDeriveResult statsDeriveResult = new StatsDeriveResult(rowCount, Maps.newHashMap());
+ if (joinType.isRemainLeftJoin()) {
+ statsDeriveResult.merge(leftStats);
+ }
+ if (joinType.isRemainRightJoin()) {
+ statsDeriveResult.merge(rightStats);
+ }
+ statsDeriveResult.setRowCount(rowCount);
+ statsDeriveResult.setWidth(rightStats.getWidth() + leftStats.getWidth());
+ statsDeriveResult.setPenalty(0.0);
+ return statsDeriveResult;
+ }
+
/**
* Do estimate.
* // TODO: since we have no column stats here. just use a fix ratio to compute the row count.
*/
- public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+ public static StatsDeriveResult estimate2(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1004043979
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/literal/Literal.java:
##########
@@ -82,6 +82,10 @@ public static Literal of(Object value) {
public abstract Object getValue();
+ public double getDouble() {
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1004053499
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -124,11 +147,56 @@ private static JoinEstimationResult estimateInnerJoin(PhysicalHashJoin join, Equ
return result;
}
+ /**
+ * estimate join
+ */
+ public static StatsDeriveResult estimate(StatsDeriveResult leftStats, StatsDeriveResult rightStats, Join join) {
+ JoinType joinType = join.getJoinType();
+ double rowCount = Double.MAX_VALUE;
+ if (joinType == JoinType.LEFT_SEMI_JOIN || joinType == JoinType.LEFT_ANTI_JOIN) {
+ rowCount = leftStats.getRowCount();
+ } else if (joinType == JoinType.RIGHT_SEMI_JOIN || joinType == JoinType.RIGHT_ANTI_JOIN) {
+ rowCount = rightStats.getRowCount();
Review Comment:
Previous version, semi and anti filter half of rows. The factor 0.5 should be refined.
In fact, currently any factor are acceptable, they will not change the best plan.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13375:
URL: https://github.com/apache/doris/pull/13375#issuecomment-1287727123
TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 39.8 seconds
load time: 562 seconds
storage size: 17154644969 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221022101245_clickbench_pr_32731.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1003974662
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostCalculator.java:
##########
@@ -47,6 +48,11 @@
* Inspired by Presto.
*/
public class CostCalculator {
+ static final double cpuWeight = 1;
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1004035628
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculator.java:
##########
@@ -104,7 +104,10 @@ public static void estimate(GroupExpression groupExpression) {
private void estimate() {
StatsDeriveResult stats = groupExpression.getPlan().accept(this, null);
- groupExpression.getOwnerGroup().setStatistics(stats);
+ if (groupExpression.getOwnerGroup().getStatistics() == null
+ || (stats.getRowCount() < groupExpression.getOwnerGroup().getStatistics().getRowCount())) {
Review Comment:
done
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculatorV2.java:
##########
@@ -99,7 +99,10 @@ public static void estimate(GroupExpression groupExpression) {
private void estimate() {
StatsDeriveResult stats = groupExpression.getPlan().accept(this, null);
- groupExpression.getOwnerGroup().setStatistics(stats);
+ StatsDeriveResult originStats = groupExpression.getOwnerGroup().getStatistics();
+ if (originStats == null || originStats.getRowCount() > stats.getRowCount()) {
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #13375:
URL: https://github.com/apache/doris/pull/13375#discussion_r1004044540
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/physical/PhysicalOlapScan.java:
##########
@@ -105,7 +105,8 @@ public PreAggStatus getPreAggStatus() {
public String toString() {
return Utils.toSqlString("PhysicalOlapScan",
"qualified", Utils.qualifiedName(qualifier, olapTable.getName()),
- "output", getOutput()
+ "output", getOutput(),
+ "stats=", statsDeriveResult
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] morrySnow merged pull request #13375: [feature](nereids) Estimate plan cost by column ndv and table row count
Posted by GitBox <gi...@apache.org>.
morrySnow merged PR #13375:
URL: https://github.com/apache/doris/pull/13375
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org