You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@calcite.apache.org by GitBox <gi...@apache.org> on 2020/02/27 05:45:56 UTC

[GitHub] [calcite] cshuo opened a new pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall

cshuo opened a new pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall
URL: https://github.com/apache/calcite/pull/1835
 
 
   In planner optimization, the digest  of Aggregate node contains digest of its AggregateCall, i.e. Aggregate.toString, but currently 'approximate' filed of AggregateCall is not considered in toString() method, which may lead to the situation two different relNodes are considered as identical in planner optimizing phase. 
   
   Here is an example:
   ```sql
   select * from (
     select a, count(distinct b) from T group by a
     union all
     select a, approx_count_distinct(b) from T group by a
   )
   ```
   After applying a rule, the plan is:
   ```sql
   LogicalSink(name=[_DataStreamTable_1], fields=[a, EXPR$1], __id__=[96])
   +- LogicalProject(a=[$0], EXPR$1=[$1], __id__=[94])
      +- LogicalUnion(all=[true], __id__=[92])
         :- LogicalAggregate(group=[{0}], EXPR$1=[COUNT(DISTINCT $1)], __id__=[89])
         :  +- LogicalTableScan(table=[[default, _DataStreamTable_2]], __id__=[100])
         +- LogicalAggregate(group=[{0}], EXPR$1=[COUNT(DISTINCT $1)], __id__=[89])
            +- LogicalTableScan(table=[[default, _DataStreamTable_2]], __id__=[100])
   ```
   As showing in the example, after optimizing, these two Aggregates are considered as identical (both with 89 as relNode ID).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [calcite] danny0405 commented on a change in pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall
URL: https://github.com/apache/calcite/pull/1835#discussion_r385035808
 
 

 ##########
 File path: core/src/main/java/org/apache/calcite/rel/core/AggregateCall.java
 ##########
 @@ -292,9 +292,15 @@ public AggregateCall rename(String name) {
   }
 
   public String toString() {
-    StringBuilder buf = new StringBuilder(aggFunction.toString());
+    StringBuilder buf = new StringBuilder();
+    // currently approximate = true is only for 'APPROX_COUNT_DISTINCT'
+    if (approximate && distinct) {
+      buf.append("APPROX_COUNT_DISTINCT");
+    } else {
 
 Review comment:
   Either is fine to me, go with your choice. BTW, that format of the distinct call would be like ? `distinct call_name(args)`  ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [calcite] cshuo commented on issue #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall

Posted by GitBox <gi...@apache.org>.
cshuo commented on issue #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall
URL: https://github.com/apache/calcite/pull/1835#issuecomment-591806283
 
 
   Many plan tests containing 'APPROX_COUNT_DISTINCT' may be failing, I'm fixing this...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [calcite] danny0405 commented on a change in pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall
URL: https://github.com/apache/calcite/pull/1835#discussion_r385035808
 
 

 ##########
 File path: core/src/main/java/org/apache/calcite/rel/core/AggregateCall.java
 ##########
 @@ -292,9 +292,15 @@ public AggregateCall rename(String name) {
   }
 
   public String toString() {
-    StringBuilder buf = new StringBuilder(aggFunction.toString());
+    StringBuilder buf = new StringBuilder();
+    // currently approximate = true is only for 'APPROX_COUNT_DISTINCT'
+    if (approximate && distinct) {
+      buf.append("APPROX_COUNT_DISTINCT");
+    } else {
 
 Review comment:
   Either is fine to me, go with your choice. BTW, that format of the approximate call would be like ? `approximate call_name(args)`  ? or `call_name()... filter $0 approximate` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [calcite] danny0405 closed pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall

Posted by GitBox <gi...@apache.org>.
danny0405 closed pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall
URL: https://github.com/apache/calcite/pull/1835
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [calcite] danny0405 commented on a change in pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall
URL: https://github.com/apache/calcite/pull/1835#discussion_r385035808
 
 

 ##########
 File path: core/src/main/java/org/apache/calcite/rel/core/AggregateCall.java
 ##########
 @@ -292,9 +292,15 @@ public AggregateCall rename(String name) {
   }
 
   public String toString() {
-    StringBuilder buf = new StringBuilder(aggFunction.toString());
+    StringBuilder buf = new StringBuilder();
+    // currently approximate = true is only for 'APPROX_COUNT_DISTINCT'
+    if (approximate && distinct) {
+      buf.append("APPROX_COUNT_DISTINCT");
+    } else {
 
 Review comment:
   Either is fine to me, go with your choice. BTW, that format of the approximate call would be like ? `approximate call_name(args)`  ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [calcite] cshuo commented on a change in pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall

Posted by GitBox <gi...@apache.org>.
cshuo commented on a change in pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall
URL: https://github.com/apache/calcite/pull/1835#discussion_r385027740
 
 

 ##########
 File path: core/src/main/java/org/apache/calcite/rel/core/AggregateCall.java
 ##########
 @@ -292,9 +292,15 @@ public AggregateCall rename(String name) {
   }
 
   public String toString() {
-    StringBuilder buf = new StringBuilder(aggFunction.toString());
+    StringBuilder buf = new StringBuilder();
+    // currently approximate = true is only for 'APPROX_COUNT_DISTINCT'
+    if (approximate && distinct) {
+      buf.append("APPROX_COUNT_DISTINCT");
+    } else {
 
 Review comment:
   I would prefer to add some distinct flag to AggregateCall rather than  Aggregate,  because  'approximate' is a property of AggregateCall, not Aggregate. What do you think, Danny?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [calcite] danny0405 commented on a change in pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #1835: [CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall
URL: https://github.com/apache/calcite/pull/1835#discussion_r385005652
 
 

 ##########
 File path: core/src/main/java/org/apache/calcite/rel/core/AggregateCall.java
 ##########
 @@ -292,9 +292,15 @@ public AggregateCall rename(String name) {
   }
 
   public String toString() {
-    StringBuilder buf = new StringBuilder(aggFunction.toString());
+    StringBuilder buf = new StringBuilder();
+    // currently approximate = true is only for 'APPROX_COUNT_DISTINCT'
+    if (approximate && distinct) {
+      buf.append("APPROX_COUNT_DISTINCT");
+    } else {
 
 Review comment:
   Agreed with Julian, we should not modify the agg call name which is a unique identifier of the function, better to add some distinct flag to the Aggregate digest instead:
   ```java
   public RelWriter explainTerms(RelWriter pw) {
       // We skip the "groups" element if it is a singleton of "group".
       super.explainTerms(pw)
           .item("group", groupSet)
           .itemIf("groups", groupSets, getGroupType() != Group.SIMPLE)
           .itemIf("aggs", aggCalls, pw.nest());
       if (!pw.nest()) {
         for (Ord<AggregateCall> ord : Ord.zip(aggCalls)) {
           pw.item(Util.first(ord.e.name, "agg#" + ord.i), ord.e);
         }
       }
       return pw;
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services