You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ignite.apache.org by GitBox <gi...@apache.org> on 2021/04/29 19:43:40 UTC

[GitHub] [ignite] tledkov-gridgain opened a new pull request #9070: IGNITE-14544 Calcite engine. Support DISTINCT aggregates

tledkov-gridgain opened a new pull request #9070:
URL: https://github.com/apache/ignite/pull/9070


   Thank you for submitting the pull request to the Apache Ignite.
   
   In order to streamline the review of the contribution 
   we ask you to ensure the following steps have been taken:
   
   ### The Contribution Checklist
   - [ ] There is a single JIRA ticket related to the pull request. 
   - [ ] The web-link to the pull request is attached to the JIRA ticket.
   - [ ] The JIRA ticket has the _Patch Available_ state.
   - [ ] The pull request body describes changes that have been made. 
   The description explains _WHAT_ and _WHY_ was made instead of _HOW_.
   - [ ] The pull request title is treated as the final commit message. 
   The following pattern must be used: `IGNITE-XXXX Change summary` where `XXXX` - number of JIRA issue.
   - [ ] A reviewer has been mentioned through the JIRA comments 
   (see [the Maintainers list](https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute#HowtoContribute-ReviewProcessandMaintainers)) 
   - [ ] The pull request has been checked by the Teamcity Bot and 
   the `green visa` attached to the JIRA ticket (see [TC.Bot: Check PR](https://mtcga.gridgain.com/prs.html))
   
   ### Notes
   - [How to Contribute](https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute)
   - [Coding abbreviation rules](https://cwiki.apache.org/confluence/display/IGNITE/Abbreviation+Rules)
   - [Coding Guidelines](https://cwiki.apache.org/confluence/display/IGNITE/Coding+Guidelines)
   - [Apache Ignite Teamcity Bot](https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+Teamcity+Bot)
   
   If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com _#ignite_ channel.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] zstan commented on a change in pull request #9070: IGNITE-14544 Calcite engine. Support DISTINCT aggregates

Posted by GitBox <gi...@apache.org>.
zstan commented on a change in pull request #9070:
URL: https://github.com/apache/ignite/pull/9070#discussion_r642789988



##########
File path: modules/calcite/src/main/java/org/apache/ignite/internal/processors/query/calcite/exec/exp/agg/Accumulators.java
##########
@@ -1003,4 +1016,53 @@ private DecimalMinMax(boolean min) {
             return typeFactory.createTypeWithNullability(typeFactory.createSqlType(DECIMAL), true);
         }
     }
+
+    /** */
+    private static class DistinctAccumulator implements Accumulator {
+        /** */
+        private final Accumulator acc;
+
+        /** */
+        private final Set<Object> set = new HashSet<>();
+
+        /** */
+        private DistinctAccumulator(Supplier<Accumulator> accSup) {
+            this.acc = accSup.get();
+        }
+
+        /** {@inheritDoc} */
+        @Override public void add(Object... args) {
+            Object in = args[0];
+
+            if (in == null)
+                return;
+
+            set.add(in);
+        }
+
+        /** {@inheritDoc} */
+        @Override public void apply(Accumulator other) {
+            DistinctAccumulator other0 = (DistinctAccumulator) other;

Review comment:
       space is redundant in cast.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] ygerzhedovich commented on a change in pull request #9070: IGNITE-14544 Calcite engine. Support DISTINCT aggregates

Posted by GitBox <gi...@apache.org>.
ygerzhedovich commented on a change in pull request #9070:
URL: https://github.com/apache/ignite/pull/9070#discussion_r639575219



##########
File path: modules/calcite/src/main/java/org/apache/ignite/internal/processors/query/calcite/CalciteQueryProcessor.java
##########
@@ -77,7 +79,14 @@
             // so it's better to disable such rewriting right now
             // TODO: remove this after IGNITE-14277
             .withInSubQueryThreshold(Integer.MAX_VALUE)
-            .withDecorrelationEnabled(true))
+            .withDecorrelationEnabled(true)
+            .withHintStrategyTable(
+                HintStrategyTable.builder()
+                    .hintStrategy("DISABLE_RULE", (hint, rel) -> true)
+                    .hintStrategy("EXPAND_DISTINCT_AGG", (hint, rel) -> rel instanceof Aggregate)

Review comment:
       maybe better extract names of hint strategies to separate place.

##########
File path: modules/calcite/src/main/java/org/apache/ignite/internal/processors/query/calcite/CalciteQueryProcessor.java
##########
@@ -77,7 +79,14 @@
             // so it's better to disable such rewriting right now
             // TODO: remove this after IGNITE-14277
             .withInSubQueryThreshold(Integer.MAX_VALUE)
-            .withDecorrelationEnabled(true))
+            .withDecorrelationEnabled(true)
+            .withHintStrategyTable(
+                HintStrategyTable.builder()
+                    .hintStrategy("DISABLE_RULE", (hint, rel) -> true)
+                    .hintStrategy("EXPAND_DISTINCT_AGG", (hint, rel) -> rel instanceof Aggregate)

Review comment:
       maybe better to extract names of hint strategies to separate place.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] alex-plekhanov commented on a change in pull request #9070: IGNITE-14544 Calcite engine. Support DISTINCT aggregates

Posted by GitBox <gi...@apache.org>.
alex-plekhanov commented on a change in pull request #9070:
URL: https://github.com/apache/ignite/pull/9070#discussion_r635559297



##########
File path: modules/calcite/src/test/java/org/apache/ignite/internal/processors/query/calcite/planner/AggregatePlannerTest.java
##########
@@ -209,6 +213,42 @@ public void distribution() throws Exception {
         assertEquals(IgniteDistributions.single(), TraitUtils.distribution(rdcAgg));
     }
 
+    /**
+     * @throws Exception If failed.
+     */
+    @Test
+    public void expandDistinctAggregates() throws Exception {
+        TestTable tbl = createAffinityTable()
+            .addIndex(RelCollations.of(ImmutableIntList.of(3, 1, 0)), "idx_val0")
+            .addIndex(RelCollations.of(ImmutableIntList.of(3, 2, 0)), "idx_val1");
+
+        IgniteSchema publicSchema = new IgniteSchema("PUBLIC");
+
+        publicSchema.addTable("TEST", tbl);
+
+        String sql = "SELECT /*+ EXPAND_DISTINCT_AGG */ SUM(DISTINCT val0), AVG(DISTINCT val1) FROM test GROUP BY grp0";

Review comment:
       There is a weird plan for this query:
   ```
   IgniteProject(SUM(DISTINCT VAL0)=[$3], AVG(DISTINCT VAL1)=[$1]): rowcount = 1.0, cumulative cost = IgniteCost [rowCount=623.0, cpu=629.0, memory=34.0, io=0.0, network=1600.0], id = 91734
     IgniteMergeJoin(condition=[IS NOT DISTINCT FROM($2, $0)], joinType=[inner], variablesSet=[[]], leftCollation=[[0]], rightCollation=[[0]]): rowcount = 1.0, cumulative cost = IgniteCost [rowCount=622.0, cpu=628.0, memory=34.0, io=0.0, network=1600.0], id = 91733
       IgniteSingleSortAggregate(group=[{0}], AVG(DISTINCT VAL1)=[AVG($1)], collation=[[0]]): rowcount = 1.0, cumulative cost = IgniteCost [rowCount=310.0, cpu=310.0, memory=17.0, io=0.0, network=800.0], id = 91729
         IgniteSingleSortAggregate(group=[{0, 1}], collation=[[0, 1]]): rowcount = 10.0, cumulative cost = IgniteCost [rowCount=300.0, cpu=300.0, memory=8.0, io=0.0, network=800.0], id = 91728
           IgniteExchange(distribution=[single]): rowcount = 100.0, cumulative cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=0.0, io=0.0, network=800.0], id = 91727
             IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val1], projects=[[$t1, $t0]], requiredColumns=[{2, 3}]): rowcount = 100.0, cumulative cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0, network=0.0], id = 230
       IgniteSingleSortAggregate(group=[{0}], SUM(DISTINCT VAL0)=[SUM($1)], collation=[[0]]): rowcount = 1.0, cumulative cost = IgniteCost [rowCount=310.0, cpu=310.0, memory=17.0, io=0.0, network=800.0], id = 91732
         IgniteSingleSortAggregate(group=[{0, 1}], collation=[[0, 1]]): rowcount = 10.0, cumulative cost = IgniteCost [rowCount=300.0, cpu=300.0, memory=8.0, io=0.0, network=800.0], id = 91731
           IgniteExchange(distribution=[single]): rowcount = 100.0, cumulative cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=0.0, io=0.0, network=800.0], id = 91730
             IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val0], projects=[[$t1, $t0]], requiredColumns=[{1, 3}]): rowcount = 100.0, cumulative cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0, network=0.0], id = 440
   ```
   Single sort aggregate is used instead of map/reduce. Even if I disable the rule with single sort, the plan is still weird, it uses map phase after the exchange:
   ```
           IgniteReduceSortAggregate(rowType=[RecordType(JavaType(class java.lang.Integer) GRP0, JavaType(class java.lang.Integer) VAL0)], group=[{0, 1}], collation=[[0, 1]]): rowcount = 10.0, cumulative cost = IgniteCost[rowCount=310.0, cpu=310.0, memory=8.0, io=0.0, network=800.0], id = 91357
             IgniteMapSortAggregate(group=[{0, 1}], collation=[[0, 1]]): rowcount = 10.0, cumulative cost = IgniteCost [rowCount=300.0, cpu=300.0, memory=8.0, io=0.0, network=800.0], id = 91356
               IgniteExchange(distribution=[single]): rowcount = 100.0, cumulative cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=0.0, io=0.0, network=800.0], id = 91355
                 IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val0], projects=[[$t1, $t0]], requiredColumns=[{1, 3}]): rowcount = 100.0, cumulative cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0, network=0.0], id = 437
   ``` 
   Perhaps there is something wrong with cost calculation for aggregates. I think for sort aggregates with "expand distinct" in this test the cost should be better than without "expand distinct" and the rule should be applied without the hint. But now, without the hint, the cost is much better and the rule is not applied automatically (perhaps without the hint it never be applied ever for other queries too). The plan without the hint:
   ```
   IgniteProject(SUM(DISTINCT VAL0)=[$1], AVG(DISTINCT VAL1)=[$2]): rowcount = 10.0, cumulative cost = IgniteCost [rowCount=230.0, cpu=230.0, memory=24.0, io=0.0, network=80.0], id = 42935
     IgniteReduceSortAggregate(rowType=[RecordType(JavaType(class java.lang.Integer) GRP0, JavaType(class java.lang.Integer) SUM(DISTINCT VAL0), JavaType(class java.lang.Integer) AVG(DISTINCT VAL1))], group=[{0}], SUM(DISTINCT VAL0)=[SUM(DISTINCT $1)], AVG(DISTINCT VAL1)=[AVG(DISTINCT $2)], collation=[[0]]): rowcount = 10.0, cumulative cost = IgniteCost [rowCount=220.0, cpu=220.0, memory=24.0, io=0.0, network=80.0], id = 42934
       IgniteExchange(distribution=[single]): rowcount = 10.0, cumulative cost = IgniteCost [rowCount=210.0, cpu=210.0, memory=24.0, io=0.0, network=80.0], id = 42933
         IgniteMapSortAggregate(group=[{0}], SUM(DISTINCT VAL0)=[SUM(DISTINCT $1)], AVG(DISTINCT VAL1)=[AVG(DISTINCT $2)], collation=[[0]]): rowcount = 10.0, cumulative cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=24.0, io=0.0, network=0.0], id = 42932
           IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val1], projects=[[$t2, $t0, $t1]], requiredColumns=[{1, 2, 3}]): rowcount = 100.0, cumulative cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0, network=0.0], id = 273
   ```
   In this plan `DISTINCT VAL0`  and `DISTINCT VAL1` will require hash maps for each aggregate call on both map and reduce phases, but memory consumption for the whole plan is only 24 bytes.

##########
File path: modules/calcite/src/main/java/org/apache/ignite/internal/processors/query/calcite/prepare/PlannerPhase.java
##########
@@ -134,6 +135,8 @@
                                         .predicate(Aggregate::isSimple)
                                         .anyInputs())).toRule(),
 
+                    AggregateExpandDistinctAggregatesRule.Config.JOIN.toRule(),

Review comment:
       Perhaps, it's more readable to use the already created instance `CoreRules.AGGREGATE_EXPAND_DISTINCT_AGGREGATES_TO_JOIN`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] tledkov-gridgain merged pull request #9070: IGNITE-14544 Calcite engine. Support DISTINCT aggregates

Posted by GitBox <gi...@apache.org>.
tledkov-gridgain merged pull request #9070:
URL: https://github.com/apache/ignite/pull/9070


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org