You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@calcite.apache.org by GitBox <gi...@apache.org> on 2021/03/04 13:52:17 UTC

[GitHub] [calcite] hqx871 opened a new pull request #2363: [CALCITE-3574] optimize sort cost formula (hqx871@gmail.com)

hqx871 opened a new pull request #2363:
URL: https://github.com/apache/calcite/pull/2363


   The old method to compute the cost of sort has some problem.
   1. when there is no need to sort, it still to compute the cpu cost of sort.
   2. use n * log(n) * rowBytes to estimate the cpu cost may be inaccurate, where n means the output row count of the sort operator, and rowBytes means the average bytes of one row .
   
   Instead, I give follow suggestion.
   1. the cpu cost is zero if there is no need to sort.
   2. use m * log(n)* rowBytes to compute the cpu cost, where m is the sum of offset + limit and n means input row count.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591517936



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,24 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;

Review comment:
       I wonder if the cost should account for the number of items in `getFieldCollations()`.
   
   In other words, it would be nice if `cpu` cost reflected the cost to sort the items.
   
   Currently, `bytesPerRow` is multiplied by `Math.log` which is wrong.
   
   The better formula should be `heapDepth*comparisonCost` where `comparisonCost` should be like `collation.getFieldCollations().size()`.
   
   I believe `bytesPerRow` should be eliminated as sort performs NO projection, and it should be safe to assume sort returns the same input data, so it does not spend CPU on transformations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591384463



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,20 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    int offsetValue = offset == null || offset instanceof RexDynamicParam ? 0
+        : RexLiteral.intValue(offset);

Review comment:
       I agree. I have try to add a RexLiteral.longValue method, but found there are too many place use RexLiteral.intValue, such as RelMdRowCount/RelMdMinRowCount/RelMdMaxRowCount.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 edited a comment on pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 edited a comment on pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#issuecomment-795271299


   hi, 
   I found duplicate code in EnumerableLimitSort. Shall I close the pr or change the EnumerableLimitSort?. code as follow
   ```
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
         RelMetadataQuery mq) {
       final double rowCount = mq.getRowCount(this.input).doubleValue();
       double toSort = getValue(this.fetch, rowCount);
       if (this.offset != null) {
         toSort += getValue(this.offset, rowCount);
       }
       // we need to sort at most rowCount rows
       toSort = Math.min(rowCount, toSort);
   
       // we need to process rowCount rows, and for every row
       // we search the key in a TreeMap with at most toSort entries
       final double lookup = Math.max(1., Math.log(toSort));
       final double bytesPerRow = this.getRowType().getFieldCount() * 4.;
       final double cpu = (rowCount * lookup) * bytesPerRow;
   
       RelOptCost cost = planner.getCostFactory().makeCost(rowCount, cpu, 0);
       return cost;
     }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591450722



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,20 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    int offsetValue = offset == null || offset instanceof RexDynamicParam ? 0
+        : RexLiteral.intValue(offset);

Review comment:
       I have logged https://issues.apache.org/jira/browse/CALCITE-4531 for `intValue` deprecation.
   I am afraid `longValue` won't solve the problem since the value might exceed `Long.MAX_VALUE` as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591578104



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,24 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;

Review comment:
       use collation.getFieldCollations().size() to replace project row size should also make the test fail




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591384463



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,20 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    int offsetValue = offset == null || offset instanceof RexDynamicParam ? 0
+        : RexLiteral.intValue(offset);

Review comment:
       I agree. I have try to add a RexLiteral.longValue method, but found there are too many place use RexLiteral.intValue to get offset or fetch value, such as RelMdRowCount/RelMdMinRowCount/RelMdMaxRowCount.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591511306



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost, "offset + fetch smaller than table size "
+        + "=> cpu cost should be: input_size * log(offset + fetch) * row_bytes");
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d, "no order by clause => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d, "fetch zero => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost, "sort limit exceeds table size "
+        + "=> cpu cost should be dominated by table size");
+  }
+
+  private void checkCpuCost(String sql, double expected, String reason) {
+    RelNode rel = convertSql(sql);
+    RelOptCost cost = computeRelSelfCost(rel);
+    final double result = cost.getCpu();
+    assertEquals(expected, result, reason);

Review comment:
       I will do it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591492029



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost, "offset + fetch smaller than table size "
+        + "=> cpu cost should be: input_size * log(offset + fetch) * row_bytes");
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d, "no order by clause => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d, "fetch zero => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost, "sort limit exceeds table size "
+        + "=> cpu cost should be dominated by table size");
+  }
+
+  private void checkCpuCost(String sql, double expected, String reason) {
+    RelNode rel = convertSql(sql);
+    RelOptCost cost = computeRelSelfCost(rel);
+    final double result = cost.getCpu();
+    assertEquals(expected, result, reason);

Review comment:
       please add `sql` and the output plan to the `reason`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 removed a comment on pull request #2363: [CALCITE-4522] Sort cost should account for the number of columns in collation

Posted by GitBox <gi...@apache.org>.

hqx871 removed a comment on pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#issuecomment-795271299


   hi, 
   I found duplicate code in EnumerableLimitSort. Shall I close the pr or change the EnumerableLimitSort?. code as follow
   ```
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
         RelMetadataQuery mq) {
       final double rowCount = mq.getRowCount(this.input).doubleValue();
       double toSort = getValue(this.fetch, rowCount);
       if (this.offset != null) {
         toSort += getValue(this.offset, rowCount);
       }
       // we need to sort at most rowCount rows
       toSort = Math.min(rowCount, toSort);
   
       // we need to process rowCount rows, and for every row
       // we search the key in a TreeMap with at most toSort entries
       final double lookup = Math.max(1., Math.log(toSort));
       final double bytesPerRow = this.getRowType().getFieldCount() * 4.;
       final double cpu = (rowCount * lookup) * bytesPerRow;
   
       RelOptCost cost = planner.getCostFactory().makeCost(rowCount, cpu, 0);
       return cost;
     }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591578104



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,24 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;

Review comment:
       use collation.getFieldCollations().size() to replace to project row size should also make the test fail




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591302751



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;
-    final double cpu = Util.nLogN(rowCount) * bytesPerRow;
+    // When output count + offset is smaller than input count, we can use heap sort,
+    // otherwise, we can use heap sort, quick sort and so on
+    final double heapSize = Math.min(rowCount + offsetValue, inCount);

Review comment:
       I make a simple query and found the LogicalSort will be implemented to EnumerableLimit and EnumerableSort. The offset and fetch will be both null, so the sort cost is still nLogN.
   Query:
   select * from foodmart.sales_fact_1997 order by cust_id limit 10
   Plan:
   EnumerableLimit(fetch=[10]): rowcount = 10.0, cumulative cost = {210.0 rows, 3795.1361487904733 cpu, 0.0 io}, id = 35
     EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 100.0, cumulative cost = {200.0 rows, 3785.1361487904733 cpu, 0.0 io}, id = 34
       EnumerableTableScan(table=[[foodmart, sales_fact_1997]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 20
   
   I also checked the EnumerableSort  code, as follow:
   
     public EnumerableSort(RelOptCluster cluster, RelTraitSet traitSet,
         RelNode input, RelCollation collation, @Nullable RexNode offset, @Nullable RexNode fetch) {
       super(cluster, traitSet, input, collation, offset, fetch);
       assert getConvention() instanceof EnumerableConvention;
       assert getConvention() == input.getConvention();
       assert fetch == null : "fetch must be null";
       assert offset == null : "offset must be null";
     }




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] rubenada commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

rubenada commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591266021



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;
-    final double cpu = Util.nLogN(rowCount) * bytesPerRow;
+    // When output count + offset is smaller than input count, we can use heap sort,
+    // otherwise, we can use heap sort, quick sort and so on
+    final double heapSize = Math.min(rowCount + offsetValue, inCount);

Review comment:
       If I am not mistaken, this change could have some side-effects, e.g.  `EnumerableSort` currently does not override `computeSelfCost`, it uses `Sort`'s one.  `EnumerableSort` implements a nLogn sorting algorithm, so the current `Sort#computeSelfCost` suits better than the new proposed one. I think if we modify the formula here, then we should override `EnumerableSort#computeSelfCost` and apply the old formula there.
   However, other downstream projects with their own sort operators (which currently rely on the current formula) might be impacted as well, so this could be seen somehow as a breaking change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591590537



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,24 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;

Review comment:
       It is OK to update the test outputs, especially when they verify too much. It would be bad if the plan shape becomes worse after the change.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591293403



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);

Review comment:
       Thanks, I set offset to be 0 if the it is instance of RexDynamicParam as follow. Am I right ?
        offsetValue = offset == null || offset instanceof RexDynamicParam ? 0 : RexLiteral.intValue(offset);
   
   When offset is a RexCall, the RexLiteral.intValue will compute the final value.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591222189



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);
+    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
+    RelOptPlanner planner = new VolcanoPlanner();
+    RelOptCost cost = rel.computeSelfCost(planner, mq);
+    assertThat(cost, notNullValue());
+    final double result = cost.getCpu();
+    assertThat(result, is(expected));

Review comment:
       >I do not find a way to add a clarification message for these matcher
   
   Please use `String reason` parameter: http://hamcrest.org/JavaHamcrest/javadoc/2.1/org/hamcrest/MatcherAssert.html#assertThat-java.lang.String-T-org.hamcrest.Matcher-




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591509283



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost, "offset + fetch smaller than table size "
+        + "=> cpu cost should be: input_size * log(offset + fetch) * row_bytes");
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d, "no order by clause => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d, "fetch zero => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost, "sort limit exceeds table size "
+        + "=> cpu cost should be dominated by table size");
+  }
+
+  private void checkCpuCost(String sql, double expected, String reason) {

Review comment:
       I see




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] rubenada commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

rubenada commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591333731



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;
-    final double cpu = Util.nLogN(rowCount) * bytesPerRow;
+    // When output count + offset is smaller than input count, we can use heap sort,
+    // otherwise, we can use heap sort, quick sort and so on
+    final double heapSize = Math.min(rowCount + offsetValue, inCount);

Review comment:
       Yes, your're right, the cost of EnumerableSort will effectively continue being nLogn

##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;
-    final double cpu = Util.nLogN(rowCount) * bytesPerRow;
+    // When output count + offset is smaller than input count, we can use heap sort,
+    // otherwise, we can use heap sort, quick sort and so on
+    final double heapSize = Math.min(rowCount + offsetValue, inCount);

Review comment:
       Yes, you're right, the cost of EnumerableSort will effectively continue being nLogn




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591511079



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost, "offset + fetch smaller than table size "
+        + "=> cpu cost should be: input_size * log(offset + fetch) * row_bytes");
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d, "no order by clause => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d, "fetch zero => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost, "sort limit exceeds table size "
+        + "=> cpu cost should be dominated by table size");
+  }
+
+  private void checkCpuCost(String sql, double expected, String reason) {
+    RelNode rel = convertSql(sql);
+    RelOptCost cost = computeRelSelfCost(rel);
+    final double result = cost.getCpu();
+    assertEquals(expected, result, reason);
+  }
+
+  private RelOptCost computeRelSelfCost(RelNode rel) {

Review comment:
       I will do it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on pull request #2363: [CALCITE-4522] Sort cost should account for the number of columns in collation

Posted by GitBox <gi...@apache.org>.

hqx871 commented on pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#issuecomment-799021206


   I made a little change as julianhyde suggested at https://issues.apache.org/jira/browse/CALCITE-4522
   - If fetch is zero, cpu cost is zero; otherwise,
   
   - if there are no sort keys, cost is min(fetch + offset, inputRowCount) * bytesPerRow; otherwise
   
   - cost is inputRowCount * log(min(fetch + offset, inputRowCount)) * bytesPerRow.
   
   a method Util.nLogM(n, m) would be useful, where n is the number of input rows, and m is the number of active rows (and therefore determines the number of times each row is compared and/or moved). We would call it as follows: Util.nLogM(inputRowCount, fetch + offset)). It would make sure that m is at least e (and therefore log is at least 1), and make sure that m is no greater than n.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591220447



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);
+    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
+    RelOptPlanner planner = new VolcanoPlanner();
+    RelOptCost cost = rel.computeSelfCost(planner, mq);
+    assertThat(cost, notNullValue());
+    final double result = cost.getCpu();
+    assertThat(result, is(expected));

Review comment:
       Suppose I have never seen the test code, and I see the following failure in the CI output:
   
   ```
   Expected: not null
   but: was null
   java.lang.AssertionError:
   Expected: not null
   but: was null
   ```
   
   What should I do? Why "expected not null"? Why "was null"? What exactly was null?
   
   The proper assertion failure should look like a good bug report rather than "go figure out what went wrong".
   
   Bad assertions and bad test code make maintenance much harder.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591196599



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);
+    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
+    RelOptPlanner planner = new VolcanoPlanner();
+    RelOptCost cost = rel.computeSelfCost(planner, mq);
+    assertThat(cost, notNullValue());
+    final double result = cost.getCpu();
+    assertThat(result, is(expected));

Review comment:
       Hi, I learned these notNullValue/is matcher from other test.  I found they will give a message by themselves.  Take the "is" matcher for example, when fail, the message like:
   Expected: is <1.0>
        but: was <0.0>
   java.lang.AssertionError: 
   Expected: is <1.0>
        but: was <0.0>
   I do not find a way to add a clarification message for these matcher




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591354987



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;
-    final double cpu = Util.nLogN(rowCount) * bytesPerRow;
+    // When output count + offset is smaller than input count, we can use heap sort,
+    // otherwise, we can use heap sort, quick sort and so on
+    final double heapSize = Math.min(rowCount + offsetValue, inCount);

Review comment:
       I found duplicate code in EnumerableLimitSort. Shall I close the pr or change the EnumerableLimitSort?. code as follow
   ```
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
         RelMetadataQuery mq) {
       final double rowCount = mq.getRowCount(this.input).doubleValue();
       double toSort = getValue(this.fetch, rowCount);
       if (this.offset != null) {
         toSort += getValue(this.offset, rowCount);
       }
       // we need to sort at most rowCount rows
       toSort = Math.min(rowCount, toSort);
   
       // we need to process rowCount rows, and for every row
       // we search the key in a TreeMap with at most toSort entries
       final double lookup = Math.max(1., Math.log(toSort));
       final double bytesPerRow = this.getRowType().getFieldCount() * 4.;
       final double cpu = (rowCount * lookup) * bytesPerRow;
   
       RelOptCost cost = planner.getCostFactory().makeCost(rowCount, cpu, 0);
       return cost;
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591198566



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);

Review comment:
       I will remove the rel instanceof LogicalSort assert, as it seams no value here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591248399



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);
+    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
+    RelOptPlanner planner = new VolcanoPlanner();
+    RelOptCost cost = rel.computeSelfCost(planner, mq);
+    assertThat(cost, notNullValue());
+    final double result = cost.getCpu();
+    assertThat(result, is(expected));

Review comment:
       Thanks, I recall that I can use assertEquals(double expected, double actual, Supplier<String> messageSupplier), just as
   in previous pr you tell me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591341149



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,20 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    int offsetValue = offset == null || offset instanceof RexDynamicParam ? 0
+        : RexLiteral.intValue(offset);

Review comment:
       `RexLiteral.intValue` would do narrowing conversion, so it would result negative values for values that exceed `Integer.MAX_VALUE`

##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,20 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    int offsetValue = offset == null || offset instanceof RexDynamicParam ? 0
+        : RexLiteral.intValue(offset);

Review comment:
       `RexLiteral.intValue` would do narrowing conversion, so it would result in negative values for values that exceed `Integer.MAX_VALUE`

##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,20 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    int offsetValue = offset == null || offset instanceof RexDynamicParam ? 0
+        : RexLiteral.intValue(offset);

Review comment:
       `RexLiteral.intValue` would do narrowing conversion, so it would yield negative values for values that exceed `Integer.MAX_VALUE`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591248399



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);
+    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
+    RelOptPlanner planner = new VolcanoPlanner();
+    RelOptCost cost = rel.computeSelfCost(planner, mq);
+    assertThat(cost, notNullValue());
+    final double result = cost.getCpu();
+    assertThat(result, is(expected));

Review comment:
       Thanks, I recall that I can use assertEquals(double expected, double actual, Supplier<String> messageSupplier), just as
   in previous pr you have told me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591201550



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);
+    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
+    RelOptPlanner planner = new VolcanoPlanner();
+    RelOptCost cost = rel.computeSelfCost(planner, mq);
+    assertThat(cost, notNullValue());

Review comment:
       I know the cost won't be null, but the ide warns me it could be. So I add the assert




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591302751



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;
-    final double cpu = Util.nLogN(rowCount) * bytesPerRow;
+    // When output count + offset is smaller than input count, we can use heap sort,
+    // otherwise, we can use heap sort, quick sort and so on
+    final double heapSize = Math.min(rowCount + offsetValue, inCount);

Review comment:
       I make a simple query and found the LogicalSort will be implemented to EnumerableLimit and EnumerableSort. The offset and fetch of  EnumerableSort will be both null, so the sort cost is still nLogN.
   Query:
   select * from foodmart.sales_fact_1997 order by cust_id limit 10
   Plan:
   EnumerableLimit(fetch=[10]): rowcount = 10.0, cumulative cost = {210.0 rows, 3795.1361487904733 cpu, 0.0 io}, id = 35
     EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 100.0, cumulative cost = {200.0 rows, 3785.1361487904733 cpu, 0.0 io}, id = 34
       EnumerableTableScan(table=[[foodmart, sales_fact_1997]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 20
   
   I also checked the EnumerableSort  code, as follow:
   
     public EnumerableSort(RelOptCluster cluster, RelTraitSet traitSet,
         RelNode input, RelCollation collation, @Nullable RexNode offset, @Nullable RexNode fetch) {
       super(cluster, traitSet, input, collation, offset, fetch);
       assert getConvention() instanceof EnumerableConvention;
       assert getConvention() == input.getConvention();
       assert fetch == null : "fetch must be null";
       assert offset == null : "offset must be null";
     }




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591492446



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost, "offset + fetch smaller than table size "
+        + "=> cpu cost should be: input_size * log(offset + fetch) * row_bytes");
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d, "no order by clause => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d, "fetch zero => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost, "sort limit exceeds table size "
+        + "=> cpu cost should be dominated by table size");
+  }
+
+  private void checkCpuCost(String sql, double expected, String reason) {
+    RelNode rel = convertSql(sql);
+    RelOptCost cost = computeRelSelfCost(rel);
+    final double result = cost.getCpu();
+    assertEquals(expected, result, reason);
+  }
+
+  private RelOptCost computeRelSelfCost(RelNode rel) {

Review comment:
       ```suggestion
     private static RelOptCost computeRelSelfCost(RelNode rel) {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#issuecomment-795271299


   hi, @rubenada 
   I found duplicate code in EnumerableLimitSort. Shall I close the pr or change the EnumerableLimitSort?. code as follow
   ```
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
         RelMetadataQuery mq) {
       final double rowCount = mq.getRowCount(this.input).doubleValue();
       double toSort = getValue(this.fetch, rowCount);
       if (this.offset != null) {
         toSort += getValue(this.offset, rowCount);
       }
       // we need to sort at most rowCount rows
       toSort = Math.min(rowCount, toSort);
   
       // we need to process rowCount rows, and for every row
       // we search the key in a TreeMap with at most toSort entries
       final double lookup = Math.max(1., Math.log(toSort));
       final double bytesPerRow = this.getRowType().getFieldCount() * 4.;
       final double cpu = (rowCount * lookup) * bytesPerRow;
   
       RelOptCost cost = planner.getCostFactory().makeCost(rowCount, cpu, 0);
       return cost;
     }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#issuecomment-793463051


   Thanks, @chunweilei , @xy2953396112. Maybe you are right, I will do it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591354987



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;
-    final double cpu = Util.nLogN(rowCount) * bytesPerRow;
+    // When output count + offset is smaller than input count, we can use heap sort,
+    // otherwise, we can use heap sort, quick sort and so on
+    final double heapSize = Math.min(rowCount + offsetValue, inCount);

Review comment:
       I found duplicate code in EnumerableLimitSort. Shall I close the pr or change the EnumerableLimitSort?. code as follow
   ```
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
         RelMetadataQuery mq) {
       final double rowCount = mq.getRowCount(this.input).doubleValue();
       double toSort = getValue(this.fetch, rowCount);
       if (this.offset != null) {
         toSort += getValue(this.offset, rowCount);
       }
       // we need to sort at most rowCount rows
       toSort = Math.min(rowCount, toSort);
   
       // we need to process rowCount rows, and for every row
       // we search the key in a TreeMap with at most toSort entries
       final double lookup = Math.max(1., Math.log(toSort));
       final double bytesPerRow = this.getRowType().getFieldCount() * 4.;
       final double cpu = (rowCount * lookup) * bytesPerRow;
   
       RelOptCost cost = planner.getCostFactory().makeCost(rowCount, cpu, 0);
       return cost;
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591198918



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+

Review comment:
       I will remove these




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591302751



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;
-    final double cpu = Util.nLogN(rowCount) * bytesPerRow;
+    // When output count + offset is smaller than input count, we can use heap sort,
+    // otherwise, we can use heap sort, quick sort and so on
+    final double heapSize = Math.min(rowCount + offsetValue, inCount);

Review comment:
       Query:
   select * from foodmart.sales_fact_1997 order by cust_id limit 10
   Plan:
   EnumerableLimit(fetch=[10]): rowcount = 10.0, cumulative cost = {210.0 rows, 3795.1361487904733 cpu, 0.0 io}, id = 35
     EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 100.0, cumulative cost = {200.0 rows, 3785.1361487904733 cpu, 0.0 io}, id = 34
       EnumerableTableScan(table=[[foodmart, sales_fact_1997]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 20
   
   I read the EnumerableSort and found the offset and fetch will be both null, so the sort cost is still nLogN.  code as follow:
   
     public EnumerableSort(RelOptCluster cluster, RelTraitSet traitSet,
         RelNode input, RelCollation collation, @Nullable RexNode offset, @Nullable RexNode fetch) {
       super(cluster, traitSet, input, collation, offset, fetch);
       assert getConvention() instanceof EnumerableConvention;
       assert getConvention() == input.getConvention();
       assert fetch == null : "fetch must be null";
       assert offset == null : "offset must be null";
     }




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591176768



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);
+    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
+    RelOptPlanner planner = new VolcanoPlanner();
+    RelOptCost cost = rel.computeSelfCost(planner, mq);
+    assertThat(cost, notNullValue());
+    final double result = cost.getCpu();
+    assertThat(result, is(expected));

Review comment:
       Please add a clarification message that explains why a specific value expected (like above)

##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);

Review comment:
       Please improve assertion style.
   The current assert would fail like `query should be sort, expected true got false` which is not really actionable. Why `expected true`? Why `query should be sort`?
   
   I suggest the following:
   1) Ensure input SQL is a part of the message
   2) Prefer `instanceOf(LogicalSort.class)` matcher as it would display the actual input class and the desired one
   3) It would be a great plus if you printed the query plan in case of failure. In other words, it is with adding the full plan into the message. Then it would contain both input SQL, and the actual plan which would make CI failure analysis way easier (think of poor developers who would observe the failure after an innocent planner fix)
   4) As you might guess it might be expensive to compute the plan always, so JUnit5 assertion with lambdas might help here. On the other hand, three tests won't hurt much, so even "compute plan always" might be just fine
   
   
   
   
   Please ensure the input SQL is a part of the error message.
   

##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);
+    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
+    RelOptPlanner planner = new VolcanoPlanner();
+    RelOptCost cost = rel.computeSelfCost(planner, mq);
+    assertThat(cost, notNullValue());

Review comment:
       Please add a clarification message that explains why non-null value expected (like above)

##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+

Review comment:
       Frankly speaking, I am inclined that these markers add little value, and they go out of date quite soon

##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {

Review comment:
       Thanks for factoring out the verification function. That rocks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r589134468



##########
File path: core/src/test/java/org/apache/calcite/test/RelCostTest.java
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.test;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.plan.volcano.VolcanoPlanner;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.RelRoot;
+import org.apache.calcite.rel.logical.LogicalSort;
+import org.apache.calcite.rel.metadata.DefaultRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.util.Util;
+
+import org.junit.jupiter.api.Test;
+
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.CoreMatchers.notNullValue;
+import static org.hamcrest.MatcherAssert.assertThat;
+
+/**
+ * Unit test for {@link RelOptCost}. See

Review comment:
       RelMetadataTest has too many method, a new class may be more clear and readable




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591354987



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;
-    final double cpu = Util.nLogN(rowCount) * bytesPerRow;
+    // When output count + offset is smaller than input count, we can use heap sort,
+    // otherwise, we can use heap sort, quick sort and so on
+    final double heapSize = Math.min(rowCount + offsetValue, inCount);

Review comment:
       I found duplicate code in EnumerableLimitSort. Shall I close the pr or change the EnumerableLimitSort. code as follow
   ```
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
         RelMetadataQuery mq) {
       final double rowCount = mq.getRowCount(this.input).doubleValue();
       double toSort = getValue(this.fetch, rowCount);
       if (this.offset != null) {
         toSort += getValue(this.offset, rowCount);
       }
       // we need to sort at most rowCount rows
       toSort = Math.min(rowCount, toSort);
   
       // we need to process rowCount rows, and for every row
       // we search the key in a TreeMap with at most toSort entries
       final double lookup = Math.max(1., Math.log(toSort));
       final double bytesPerRow = this.getRowType().getFieldCount() * 4.;
       final double cpu = (rowCount * lookup) * bytesPerRow;
   
       RelOptCost cost = planner.getCostFactory().makeCost(rowCount, cpu, 0);
       return cost;
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591572730



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,24 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;

Review comment:
       > I have try to deal with above problem, but when I change it, some test failed. such as JdbcTest.testAlmostBushy
   
   What are the failures? Does the change improve things? Does it make plans worse?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] asfgit closed pull request #2363: [CALCITE-4522] Sort cost should account for the number of columns in collation

Posted by GitBox <gi...@apache.org>.

asfgit closed pull request #2363:
URL: https://github.com/apache/calcite/pull/2363


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] rubenada commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

rubenada commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591260387



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);

Review comment:
       Theoretically, `offset` could not be a `RexLiteral` (it could be a `RexCall` or a `RexDynamicParam`), which would lead to an `AssertionError` when calling `RexLiteral.intValue`. To be on the safe side, this should be checked before calling this function.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] xy2953396112 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

xy2953396112 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r589965470



##########
File path: core/src/test/java/org/apache/calcite/test/RelCostTest.java
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.test;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.plan.volcano.VolcanoPlanner;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.RelRoot;
+import org.apache.calcite.rel.logical.LogicalSort;
+import org.apache.calcite.rel.metadata.DefaultRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.util.Util;
+
+import org.junit.jupiter.api.Test;
+
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.CoreMatchers.notNullValue;
+import static org.hamcrest.MatcherAssert.assertThat;
+
+/**
+ * Unit test for {@link RelOptCost}. See

Review comment:
       It seems better to put those tests together with the original tests instead of writing a new test class, such as ` RelMetadataTest`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591201550



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);
+    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
+    RelOptPlanner planner = new VolcanoPlanner();
+    RelOptCost cost = rel.computeSelfCost(planner, mq);
+    assertThat(cost, notNullValue());

Review comment:
       I know the cost won't be null, but the ide warns me it could be. So I add the assert




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591492723



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost, "offset + fetch smaller than table size "
+        + "=> cpu cost should be: input_size * log(offset + fetch) * row_bytes");
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d, "no order by clause => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d, "fetch zero => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost, "sort limit exceeds table size "
+        + "=> cpu cost should be dominated by table size");
+  }
+
+  private void checkCpuCost(String sql, double expected, String reason) {

Review comment:
       ```suggestion
     private static void checkCpuCost(String sql, double expected, String reason) {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591179092



##########
File path: core/src/test/java/org/apache/calcite/test/RelCostTest.java
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.test;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.plan.volcano.VolcanoPlanner;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.RelRoot;
+import org.apache.calcite.rel.logical.LogicalSort;
+import org.apache.calcite.rel.metadata.DefaultRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.util.Util;
+
+import org.junit.jupiter.api.Test;
+
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.CoreMatchers.notNullValue;
+import static org.hamcrest.MatcherAssert.assertThat;
+
+/**
+ * Unit test for {@link RelOptCost}. See

Review comment:
       I agree with @chunweilei , something like `SortCostTest` might work slightly better here, however, I have no strong preference.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591196599



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);
+    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
+    RelOptPlanner planner = new VolcanoPlanner();
+    RelOptCost cost = rel.computeSelfCost(planner, mq);
+    assertThat(cost, notNullValue());
+    final double result = cost.getCpu();
+    assertThat(result, is(expected));

Review comment:
       Hi, I learned these notNullValue/is matcher from other test.  I found they will give a clarification message by themselves.  Take the notNullValue for example, when fail, the message like:
   Expected: not null
        but: was null
   java.lang.AssertionError: 
   Expected: not null
        but: was null
   I do not find a way to add a clarification message for these matcher




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591334749



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3415,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    RelOptCost cost = computeRelSelfCost(rel);
+    final double result = cost.getCpu();
+    assertEquals(expected, result, () -> "cpu cost is not as expected <"
+        + expected + ">, but <" + result + ">, plan as follow:\n"
+        + RelOptUtil.toString(rel, SqlExplainLevel.ALL_ATTRIBUTES));

Review comment:
       You've added `...<"        + expected + ">, but <" + result + ">...` to the exception message. However, it would make the message complicated with no extra gain.
   `assertEquals` would print `expected` vs `actual` on its own, so `message` parameter (which you write in the test code) *must not* include `expected vs actual`.
   
   On the other hand, `message` parameter must explain **why** the value is expected like that.
   
   For instance: "sort limit exceeds table size => cost should be dominated by table size", "limit is 0, cost must be 0", and so on.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591271476



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3415,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    RelOptCost cost = computeRelSelfCost(rel);
+    final double result = cost.getCpu();
+    assertEquals(expected, result, () -> "cpu cost is not as expected <"
+        + expected + ">, but <" + result + ">, plan as follow:\n"
+        + RelOptUtil.toString(rel, SqlExplainLevel.ALL_ATTRIBUTES));

Review comment:
       Please provide the output of the failure. Does it look like a good bug report to you?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591558511



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,24 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;

Review comment:
       I also confused by the bytesPerRow. 
   1. collation.getFieldCollations().size() seams better than project row size.
   2. RelMetadataQuery.getAverageRowSize also be better than project field count * 4 to compute bytesPerRow
   3. the log use base E, not 2.
   I have try to deal with above problem, but when I change it, some test failed. such as JdbcTest.testAlmostBushy. I do not know making a change will affect other class or not, so I just keep it. and only change nlogn to mlogn




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591310635



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3415,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    RelOptCost cost = computeRelSelfCost(rel);
+    final double result = cost.getCpu();
+    assertEquals(expected, result, () -> "cpu cost is not as expected <"
+        + expected + ">, but <" + result + ">, plan as follow:\n"
+        + RelOptUtil.toString(rel, SqlExplainLevel.ALL_ATTRIBUTES));

Review comment:
       hi @vlsi, could you give me an example? I am poor in English.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] chunweilei commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

chunweilei commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r589130629



##########
File path: core/src/test/java/org/apache/calcite/test/RelCostTest.java
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.test;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.plan.volcano.VolcanoPlanner;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.RelRoot;
+import org.apache.calcite.rel.logical.LogicalSort;
+import org.apache.calcite.rel.metadata.DefaultRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.util.Util;
+
+import org.junit.jupiter.api.Test;
+
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.CoreMatchers.notNullValue;
+import static org.hamcrest.MatcherAssert.assertThat;
+
+/**
+ * Unit test for {@link RelOptCost}. See

Review comment:
       Do we really have to add a new test class? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591518869



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,24 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;

Review comment:
       `discourages pushing a project through a sort` should be addressed in `project` costing model. I do not understand why `sort` needs to account for `projects`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591384463



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,20 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    int offsetValue = offset == null || offset instanceof RexDynamicParam ? 0
+        : RexLiteral.intValue(offset);

Review comment:
       I agree. I have try to add a RexLiteral.longValue method, but found there are too many place use RexLiteral.intValue, such as RelMdRowCount.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort cost should account for the number of columns in collation

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591888771



##########
File path: plus/src/test/java/org/apache/calcite/adapter/tpcds/TpcdsTest.java
##########
@@ -220,9 +220,9 @@ protected void foo(CalciteAssert.AssertThat with, String tableName,
         .withHook(Hook.PROGRAM, handler(true, 2))
         .explainMatches("including all attributes ",
             CalciteAssert.checkMaskedResultContains(""
-                + "EnumerableCalc(expr#0..9=[{inputs}], expr#10=[/($t4, $t3)], expr#11=[CAST($t10):INTEGER NOT NULL], expr#12=[*($t4, $t4)], expr#13=[/($t12, $t3)], expr#14=[-($t5, $t13)], expr#15=[1], expr#16=[=($t3, $t15)], expr#17=[null:BIGINT], expr#18=[-($t3, $t15)], expr#19=[CASE($t16, $t17, $t18)], expr#20=[/($t14, $t19)], expr#21=[0.5:DECIMAL(2, 1)], expr#22=[POWER($t20, $t21)], expr#23=[CAST($t22):INTEGER NOT NULL], expr#24=[/($t23, $t11)], expr#25=[/($t6, $t3)], expr#26=[CAST($t25):INTEGER NOT NULL], expr#27=[*($t6, $t6)], expr#28=[/($t27, $t3)], expr#29=[-($t7, $t28)], expr#30=[/($t29, $t19)], expr#31=[POWER($t30, $t21)], expr#32=[CAST($t31):INTEGER NOT NULL], expr#33=[/($t32, $t26)], expr#34=[/($t8, $t3)], expr#35=[CAST($t34):INTEGER NOT NULL], expr#36=[*($t8, $t8)], expr#37=[/($t36, $t3)], expr#38=[-($t9, $t37)], expr#39=[/($t38, $t19)], expr#40=[POWER($t39, $t21)], expr#41=[CAST($t40):INTEGER NOT NULL], expr#42=[/($t41, $t35)], proj#0..3=[{exprs}], STORE_SALES_QUANTITY
 AVE=[$t11], STORE_SALES_QUANTITYSTDEV=[$t23], STORE_SALES_QUANTITYCOV=[$t24], AS_STORE_RETURNS_QUANTITYCOUNT=[$t3], AS_STORE_RETURNS_QUANTITYAVE=[$t26], AS_STORE_RETURNS_QUANTITYSTDEV=[$t32], STORE_RETURNS_QUANTITYCOV=[$t33], CATALOG_SALES_QUANTITYCOUNT=[$t3], CATALOG_SALES_QUANTITYAVE=[$t35], CATALOG_SALES_QUANTITYSTDEV=[$t42], CATALOG_SALES_QUANTITYCOV=[$t42]): rowcount = 100.0, cumulative cost = {1.2435775409784036E28 rows, 2.555295485909236E30 cpu, 0.0 io}\n"
-                + "  EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost = {1.2435775409784036E28 rows, 2.555295485909236E30 cpu, 0.0 io}\n"
-                + "    EnumerableSort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC], dir2=[ASC]): rowcount = 5.434029018852197E26, cumulative cost = {1.2435775409784036E28 rows, 2.555295485909236E30 cpu, 0.0 io}\n"
+                + "EnumerableCalc(expr#0..9=[{inputs}], expr#10=[/($t4, $t3)], expr#11=[CAST($t10):INTEGER NOT NULL], expr#12=[*($t4, $t4)], expr#13=[/($t12, $t3)], expr#14=[-($t5, $t13)], expr#15=[1], expr#16=[=($t3, $t15)], expr#17=[null:BIGINT], expr#18=[-($t3, $t15)], expr#19=[CASE($t16, $t17, $t18)], expr#20=[/($t14, $t19)], expr#21=[0.5:DECIMAL(2, 1)], expr#22=[POWER($t20, $t21)], expr#23=[CAST($t22):INTEGER NOT NULL], expr#24=[/($t23, $t11)], expr#25=[/($t6, $t3)], expr#26=[CAST($t25):INTEGER NOT NULL], expr#27=[*($t6, $t6)], expr#28=[/($t27, $t3)], expr#29=[-($t7, $t28)], expr#30=[/($t29, $t19)], expr#31=[POWER($t30, $t21)], expr#32=[CAST($t31):INTEGER NOT NULL], expr#33=[/($t32, $t26)], expr#34=[/($t8, $t3)], expr#35=[CAST($t34):INTEGER NOT NULL], expr#36=[*($t8, $t8)], expr#37=[/($t36, $t3)], expr#38=[-($t9, $t37)], expr#39=[/($t38, $t19)], expr#40=[POWER($t39, $t21)], expr#41=[CAST($t40):INTEGER NOT NULL], expr#42=[/($t41, $t35)], proj#0..3=[{exprs}], STORE_SALES_QUANTITY
 AVE=[$t11], STORE_SALES_QUANTITYSTDEV=[$t23], STORE_SALES_QUANTITYCOV=[$t24], AS_STORE_RETURNS_QUANTITYCOUNT=[$t3], AS_STORE_RETURNS_QUANTITYAVE=[$t26], AS_STORE_RETURNS_QUANTITYSTDEV=[$t32], STORE_RETURNS_QUANTITYCOV=[$t33], CATALOG_SALES_QUANTITYCOUNT=[$t3], CATALOG_SALES_QUANTITYAVE=[$t35], CATALOG_SALES_QUANTITYSTDEV=[$t42], CATALOG_SALES_QUANTITYCOV=[$t42]): rowcount = 100.0, cumulative cost = {1.2435775409784036E28 rows, 1.317577974149368E30 cpu, 0.0 io}\n"
+                + "  EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost = {1.2435775409784036E28 rows, 1.317577974149368E30 cpu, 0.0 io}\n"

Review comment:
       I filed https://github.com/apache/calcite/pull/2368 to remove the cost and row attributes from the test




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591508784



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost, "offset + fetch smaller than table size "
+        + "=> cpu cost should be: input_size * log(offset + fetch) * row_bytes");
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d, "no order by clause => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d, "fetch zero => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost, "sort limit exceeds table size "
+        + "=> cpu cost should be dominated by table size");
+  }
+
+  private void checkCpuCost(String sql, double expected, String reason) {

Review comment:
       This method cannot make to static, becase the convertSql
   ```
     private RelNode convertSql(String sql, boolean typeCoercion) {
       final Tester tester = typeCoercion ? this.tester : this.strictTester;
       return convertSql(tester, sql);
     }
   ```

##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost, "offset + fetch smaller than table size "
+        + "=> cpu cost should be: input_size * log(offset + fetch) * row_bytes");
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d, "no order by clause => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d, "fetch zero => cpu cost should be 0");
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost, "sort limit exceeds table size "
+        + "=> cpu cost should be dominated by table size");
+  }
+
+  private void checkCpuCost(String sql, double expected, String reason) {

Review comment:
       This method cannot make to static, becase of the convertSql
   ```
     private RelNode convertSql(String sql, boolean typeCoercion) {
       final Tester tester = typeCoercion ? this.tester : this.strictTester;
       return convertSql(tester, sql);
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591201550



##########
File path: core/src/test/java/org/apache/calcite/test/RelMetadataTest.java
##########
@@ -3412,4 +3414,41 @@ public String colType(RelNode rel, int column) {
     assertThat(columnOrigin.getOriginTable().getRowType().getFieldNames().get(5),
         equalTo("SAL"));
   }
+
+  // ----------------------------------------------------------------------
+  // Tests for SortCpuCost
+  // ----------------------------------------------------------------------
+
+  @Test void testSortCpuCostOffsetLimit() {
+    final String sql = "select ename from emp order by ename limit 5 offset 5";
+    double cpuCost = EMP_SIZE * Math.log(10) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  @Test void testSortCpuCostLimit() {
+    final String sql = "select ename from emp limit 10";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLimit0() {
+    final String sql = "select ename from emp order by ename limit 0";
+    checkCpuCost(sql, 0d);
+  }
+
+  @Test void testSortCpuCostLargeLimit() {
+    final String sql = "select ename from emp order by ename limit 10000";
+    double cpuCost = EMP_SIZE * Math.log(EMP_SIZE) * 4;
+    checkCpuCost(sql, cpuCost);
+  }
+
+  private void checkCpuCost(String sql, double expected) {
+    RelNode rel = convertSql(sql);
+    assertThat("query should be sort", rel instanceof LogicalSort);
+    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
+    RelOptPlanner planner = new VolcanoPlanner();
+    RelOptCost cost = rel.computeSelfCost(planner, mq);
+    assertThat(cost, notNullValue());

Review comment:
       I know the cost won't be null, bu the ide warns me it could be. So I add the assert




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] vlsi commented on a change in pull request #2363: [CALCITE-4522] Sort cost should account for the number of columns in collation

Posted by GitBox <gi...@apache.org>.

vlsi commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591731788



##########
File path: plus/src/test/java/org/apache/calcite/adapter/tpcds/TpcdsTest.java
##########
@@ -220,9 +220,9 @@ protected void foo(CalciteAssert.AssertThat with, String tableName,
         .withHook(Hook.PROGRAM, handler(true, 2))
         .explainMatches("including all attributes ",
             CalciteAssert.checkMaskedResultContains(""
-                + "EnumerableCalc(expr#0..9=[{inputs}], expr#10=[/($t4, $t3)], expr#11=[CAST($t10):INTEGER NOT NULL], expr#12=[*($t4, $t4)], expr#13=[/($t12, $t3)], expr#14=[-($t5, $t13)], expr#15=[1], expr#16=[=($t3, $t15)], expr#17=[null:BIGINT], expr#18=[-($t3, $t15)], expr#19=[CASE($t16, $t17, $t18)], expr#20=[/($t14, $t19)], expr#21=[0.5:DECIMAL(2, 1)], expr#22=[POWER($t20, $t21)], expr#23=[CAST($t22):INTEGER NOT NULL], expr#24=[/($t23, $t11)], expr#25=[/($t6, $t3)], expr#26=[CAST($t25):INTEGER NOT NULL], expr#27=[*($t6, $t6)], expr#28=[/($t27, $t3)], expr#29=[-($t7, $t28)], expr#30=[/($t29, $t19)], expr#31=[POWER($t30, $t21)], expr#32=[CAST($t31):INTEGER NOT NULL], expr#33=[/($t32, $t26)], expr#34=[/($t8, $t3)], expr#35=[CAST($t34):INTEGER NOT NULL], expr#36=[*($t8, $t8)], expr#37=[/($t36, $t3)], expr#38=[-($t9, $t37)], expr#39=[/($t38, $t19)], expr#40=[POWER($t39, $t21)], expr#41=[CAST($t40):INTEGER NOT NULL], expr#42=[/($t41, $t35)], proj#0..3=[{exprs}], STORE_SALES_QUANTITY
 AVE=[$t11], STORE_SALES_QUANTITYSTDEV=[$t23], STORE_SALES_QUANTITYCOV=[$t24], AS_STORE_RETURNS_QUANTITYCOUNT=[$t3], AS_STORE_RETURNS_QUANTITYAVE=[$t26], AS_STORE_RETURNS_QUANTITYSTDEV=[$t32], STORE_RETURNS_QUANTITYCOV=[$t33], CATALOG_SALES_QUANTITYCOUNT=[$t3], CATALOG_SALES_QUANTITYAVE=[$t35], CATALOG_SALES_QUANTITYSTDEV=[$t42], CATALOG_SALES_QUANTITYCOV=[$t42]): rowcount = 100.0, cumulative cost = {1.2435775409784036E28 rows, 2.555295485909236E30 cpu, 0.0 io}\n"
-                + "  EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost = {1.2435775409784036E28 rows, 2.555295485909236E30 cpu, 0.0 io}\n"
-                + "    EnumerableSort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC], dir2=[ASC]): rowcount = 5.434029018852197E26, cumulative cost = {1.2435775409784036E28 rows, 2.555295485909236E30 cpu, 0.0 io}\n"
+                + "EnumerableCalc(expr#0..9=[{inputs}], expr#10=[/($t4, $t3)], expr#11=[CAST($t10):INTEGER NOT NULL], expr#12=[*($t4, $t4)], expr#13=[/($t12, $t3)], expr#14=[-($t5, $t13)], expr#15=[1], expr#16=[=($t3, $t15)], expr#17=[null:BIGINT], expr#18=[-($t3, $t15)], expr#19=[CASE($t16, $t17, $t18)], expr#20=[/($t14, $t19)], expr#21=[0.5:DECIMAL(2, 1)], expr#22=[POWER($t20, $t21)], expr#23=[CAST($t22):INTEGER NOT NULL], expr#24=[/($t23, $t11)], expr#25=[/($t6, $t3)], expr#26=[CAST($t25):INTEGER NOT NULL], expr#27=[*($t6, $t6)], expr#28=[/($t27, $t3)], expr#29=[-($t7, $t28)], expr#30=[/($t29, $t19)], expr#31=[POWER($t30, $t21)], expr#32=[CAST($t31):INTEGER NOT NULL], expr#33=[/($t32, $t26)], expr#34=[/($t8, $t3)], expr#35=[CAST($t34):INTEGER NOT NULL], expr#36=[*($t8, $t8)], expr#37=[/($t36, $t3)], expr#38=[-($t9, $t37)], expr#39=[/($t38, $t19)], expr#40=[POWER($t39, $t21)], expr#41=[CAST($t40):INTEGER NOT NULL], expr#42=[/($t41, $t35)], proj#0..3=[{exprs}], STORE_SALES_QUANTITY
 AVE=[$t11], STORE_SALES_QUANTITYSTDEV=[$t23], STORE_SALES_QUANTITYCOV=[$t24], AS_STORE_RETURNS_QUANTITYCOUNT=[$t3], AS_STORE_RETURNS_QUANTITYAVE=[$t26], AS_STORE_RETURNS_QUANTITYSTDEV=[$t32], STORE_RETURNS_QUANTITYCOV=[$t33], CATALOG_SALES_QUANTITYCOUNT=[$t3], CATALOG_SALES_QUANTITYAVE=[$t35], CATALOG_SALES_QUANTITYSTDEV=[$t42], CATALOG_SALES_QUANTITYCOV=[$t42]): rowcount = 100.0, cumulative cost = {1.2435775409784036E28 rows, 1.317577974149368E30 cpu, 0.0 io}\n"
+                + "  EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost = {1.2435775409784036E28 rows, 1.317577974149368E30 cpu, 0.0 io}\n"

Review comment:
       It is funny how this test always fails on plan changes.
   I added `slow-tests-needed` label since I know this test hard-codes all costing attributes.
   
   @julianhyde , would you mind if we remove "all attributes" and keep only the verification of the basic plan shape here? Is the verification of costs really important for the given plan `testQuery17Plan`?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591558511



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,24 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;

Review comment:
       I also confused by the bytesPerRow. 
   1. collation.getFieldCollations().size() seams better than project row size.
   2. RelMetadataQuery.getAverageRowSize also be better than project field count * 4 to compute bytesPerRow
   3. the log use base E, not 2.
   
   I have try to deal with above problem, but when I change it, some test failed. such as JdbcTest.testAlmostBushy. I do not know making a change will affect other class or not, so I just keep it. and only change nlogn to mlogn




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hqx871 commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

hqx871 commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591577136



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +126,24 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final double inCount = mq.getRowCount(input);
     // Higher cost if rows are wider discourages pushing a project through a
     // sort.
-    final double rowCount = mq.getRowCount(this);
     final double bytesPerRow = getRowType().getFieldCount() * 4;

Review comment:
       I use log2 to replace logE, this obviously increase the cpu cost. and the test JdbcTest.testAlmostBushy failed.
   ```
   /** Just short of bushy. */
     @Test void testAlmostBushy() {
       CalciteAssert.that()
           .with(CalciteAssert.Config.FOODMART_CLONE)
           .query("select *\n"
               + "from \"sales_fact_1997\" as s\n"
               + "join \"customer\" as c\n"
               + "  on s.\"customer_id\" = c.\"customer_id\"\n"
               + "join \"product\" as p\n"
               + "  on s.\"product_id\" = p.\"product_id\"\n"
               + "where c.\"city\" = 'San Francisco'\n"
               + "and p.\"brand_name\" = 'Washington'")
           .explainMatches("including all attributes ",
               CalciteAssert.checkMaskedResultContains(""
                   + "EnumerableMergeJoin(condition=[=($0, $38)], joinType=[inner]): rowcount = 7.050660528307499E8, cumulative cost = {7.656040129282498E8 rows, 5.0023949296644424E10 cpu, 0.0 io}\n"
                   + "  EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 2.0087351932499997E7, cumulative cost = {4.044858016499999E7 rows, 5.0023896255644424E10 cpu, 0.0 io}\n"
                   + "    EnumerableMergeJoin(condition=[=($2, $8)], joinType=[inner]): rowcount = 2.0087351932499997E7, cumulative cost = {2.0361228232499994E7 rows, 3.232400376004586E7 cpu, 0.0 io}\n"
                   + "      EnumerableSort(sort0=[$2], dir0=[ASC]): rowcount = 86837.0, cumulative cost = {173674.0 rows, 3.168658076004586E7 cpu, 0.0 io}\n"
                   + "        EnumerableTableScan(table=[[foodmart2, sales_fact_1997]]): rowcount = 86837.0, cumulative cost = {86837.0 rows, 86838.0 cpu, 0.0 io}\n"
                   + "      EnumerableCalc(expr#0..28=[{inputs}], expr#29=['San Francisco':VARCHAR(30)], expr#30=[=($t9, $t29)], proj#0..28=[{exprs}], $condition=[$t30]): rowcount = 1542.1499999999999, cumulative cost = {11823.15 rows, 637423.0 cpu, 0.0 io}\n"
                   + "        EnumerableTableScan(table=[[foodmart2, customer]]): rowcount = 10281.0, cumulative cost = {10281.0 rows, 10282.0 cpu, 0.0 io}\n"
                   + "  EnumerableCalc(expr#0..14=[{inputs}], expr#15=['Washington':VARCHAR(60)], expr#16=[=($t2, $t15)], proj#0..14=[{exprs}], $condition=[$t16]): rowcount = 234.0, cumulative cost = {1794.0 rows, 53041.0 cpu, 0.0 io}\n"
                   + "    EnumerableTableScan(table=[[foodmart2, product]]): rowcount = 1560.0, cumulative cost = {1560.0 rows, 1561.0 cpu, 0.0 io}\n"));
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] rubenada commented on a change in pull request #2363: [CALCITE-4522] Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Posted by GitBox <gi...@apache.org>.

rubenada commented on a change in pull request #2363:
URL: https://github.com/apache/calcite/pull/2363#discussion_r591260387



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Sort.java
##########
@@ -124,11 +125,19 @@ public abstract Sort copy(RelTraitSet traitSet, RelNode newInput,
 
   @Override public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner,
       RelMetadataQuery mq) {
+    double rowCount = mq.getRowCount(this);
+    if (collation.getFieldCollations().isEmpty()) {
+      return planner.getCostFactory().makeCost(rowCount, 0, 0);
+    }
+    final int offsetValue = offset == null ? 0 : RexLiteral.intValue(offset);

Review comment:
       Theoretically, `offset` could not be a `RexLiteral` (e.g. it could be a `RexCall` or a `RexDynamicParam`), which would lead to an `AssertionError` when calling `RexLiteral.intValue`. To be on the safe side, this should be checked before calling this function.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org