You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/01/03 19:03:49 UTC

[GitHub] [druid] gianm opened a new pull request #9122: Add SQL GROUPING SETS support.

gianm opened a new pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122

Built on top of the subtotalsSpec feature in the groupBy query. This also involves
two changes to subtotalsSpec:

- Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to
each grouping set. This is more in line with SQL standard behavior. I think it is okay
to make this change, since the old behavior was not documented, so users should
hopefully not be depending on it.
- Fix a bug where virtual columns were included in the subtotal queries, but they
should not have been.

Also fixes two bugs in query equality checking:

- BaseQuery: Use getDuration() instead of "duration" in equals and hashCode, since the
latter is lazily initialized and might be null in one query but not the other.
- GroupByQuery: Include subtotalsSpec in equals and hashCode.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384126605
 
 

 ##########
 File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
 ##########
 @@ -9579,6 +9600,538 @@ public void testGroupByTimeAndOtherDimension() throws Exception
     );
   }
 
+  @Test
+  public void testGroupingSets() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (dim2, gran), (dim2), (gran), () )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v1"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithNumericDimension() throws Exception
+  {
+    testQuery(
+        "SELECT cnt, COUNT(*)\n"
+        + "FROM foo\n"
+        + "GROUP BY GROUPING SETS ( (cnt), () )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setDimensions(dimensions(new DefaultDimensionSpec("cnt", "d0", ValueType.LONG)))
+                        .setAggregatorSpecs(aggregators(new CountAggregatorFactory("a0")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("d0"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{1L, 6L},
+            new Object[]{null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupByRollup() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY ROLLUP (dim2, gran)",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v1"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupByCube() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY CUBE (dim2, gran)",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v1"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithDummyDimension() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (dim2, 'dummy', gran), (dim2), (gran), ('dummy') )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v2",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v2", "v2", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v2"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of(),
+                                ImmutableList.of("v2")
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, null, 6L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsNoSuperset() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    // Note: the grouping sets are reordered in the output of this query, but this is allowed.
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithOrderByDimension() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )\n"
+        + "ORDER BY gran, dim2 DESC",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setLimitSpec(
+                            new DefaultLimitSpec(
+                                ImmutableList.of(
+                                    new OrderByColumnSpec(
+                                        "v1",
+                                        Direction.ASCENDING,
+                                        StringComparators.NUMERIC
+                                    ),
+                                    new OrderByColumnSpec(
+                                        "v0",
+                                        Direction.DESCENDING,
+                                        StringComparators.LEXICOGRAPHIC
+                                    )
+                                ),
+                                Integer.MAX_VALUE
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"abc", null, 1L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"", null, 3L},
+            new Object[]{NULL_VALUE, null, 6L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithOrderByAggregator() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )\n"
+        + "ORDER BY SUM(cnt)\n",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setLimitSpec(
+                            new DefaultLimitSpec(
+                                ImmutableList.of(
+                                    new OrderByColumnSpec(
+                                        "a0",
+                                        Direction.ASCENDING,
+                                        StringComparators.NUMERIC
+                                    )
+                                ),
+                                Integer.MAX_VALUE
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"abc", null, 1L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"", null, 3L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithOrderByAggregatorWithLimit() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )\n"
+        + "ORDER BY SUM(cnt)\n"
+        + "LIMIT 1",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setLimitSpec(
+                            new DefaultLimitSpec(
+                                ImmutableList.of(
+                                    new OrderByColumnSpec(
+                                        "a0",
+                                        Direction.ASCENDING,
+                                        StringComparators.NUMERIC
+                                    )
+                                ),
+                                1
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"abc", null, 1L}
+        )
+    );
+  }
+
 
 Review comment:
   I don't think I read a test that did this explicitly. `testGroupingSetsWithDummyDimension` appears to do this where the `dummy` column isn't included in the subTotalsSpec. I guess I was just asking - is there anything special about calling `GROUPING SETS` only on a non-existent column? Should it just behave like you called GROUPING SETS on nothing?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] himanshug commented on issue #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

himanshug commented on issue #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#issuecomment-591132212
 
 
   > I'd like to make that addition, but probably not in this patch. For now, you think I should add the 'legacy' flag or do you think I should document that things will get jumbled if you have a limitSpec? Personally, I'd rather do the latter, if people don't have use cases that depend on the legacy limitSpec behavior.
   
   I'm fine with whatever you prefer as it is confirmed that behavior  is not being used.  Added `Release Notes`  tag(updated description as well) to this  PR so that this gets mentioned in release notes and other users of subtotalsSpec can get a chance shout out when this goes in a release-candidate.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384088026
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/groupby/GroupByQuery.java
 ##########
 @@ -1236,7 +1243,8 @@ public int hashCode()
         dimFilter,
         dimensions,
         aggregatorSpecs,
-        postAggregatorSpecs
+        postAggregatorSpecs,
+        subtotalsSpec
 
 Review comment:
   OK, I added it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384087132
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Subtotals.java
 ##########
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.rel;
+
+
+import it.unimi.dsi.fastutil.ints.IntList;
+import org.apache.druid.query.dimension.DimensionSpec;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+/**
+ * Represents the Druid groupBy query concept of subtotals, which is similar to GROUPING SETS.
+ */
+public class Subtotals
+{
+  /**
+   * List of subtotals: each one is a list of dimension indexes. (i.e. [0, 1] means use the first and second
+   * dimensions).
+   */
+  private final List<IntList> subtotals;
+
+  Subtotals(List<IntList> subtotals)
+  {
+    this.subtotals = subtotals;
+  }
+
+  public List<IntList> getSubtotals()
+  {
+    return subtotals;
+  }
+
+  @Nullable
+  public List<List<String>> toSubtotalsSpec(final List<DimensionSpec> dimensions)
+  {
+    if (hasEffect(dimensions)) {
+      return subtotals.stream()
+                      .map(
+                          subtotalInts -> {
+                            final List<String> subtotalDimensionNames = new ArrayList<>();
+                            for (int dimIndex : subtotalInts) {
+                              subtotalDimensionNames.add(dimensions.get(dimIndex).getOutputName());
+                            }
+                            return subtotalDimensionNames;
+                          }
+                      )
+                      .collect(Collectors.toList());
+    } else {
+      return null;
+    }
+  }
+
+  /**
+   * Returns whether this subtotals spec has an effect, and cannot be ignored.
+   */
+  public boolean hasEffect(final List<DimensionSpec> dimensionSpecs)
+  {
+    if (subtotals.isEmpty() || (subtotals.size() == 1 && subtotals.get(0).size() == dimensionSpecs.size())) {
+      return false;
+    } else {
+      return true;
+    }
+  }
+
+  @Override
+  public boolean equals(Object o)
+  {
+    if (this == o) {
+      return true;
+    }
+    if (o == null || getClass() != o.getClass()) {
+      return false;
+    }
+    Subtotals subtotals1 = (Subtotals) o;
+    return subtotals.equals(subtotals1.subtotals);
+  }
+
+  @Override
+  public int hashCode()
+  {
+    return Objects.hash(subtotals);
+  }
 
 Review comment:
   I don't expect them to be called very often, and usually in that case, I don't bother memoizing (extra code to get right).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384213561
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Grouping.java
 ##########
 @@ -92,19 +104,25 @@ private Grouping(
 
   public static Grouping create(
       final List<DimensionExpression> dimensions,
+      final Subtotals subtotals,
       final List<Aggregation> aggregations,
       final DimFilter havingFilter,
       final RowSignature outputRowSignature
   )
   {
-    return new Grouping(dimensions, aggregations, havingFilter, outputRowSignature);
+    return new Grouping(dimensions, subtotals, aggregations, havingFilter, outputRowSignature);
 
 Review comment:
   Yes, it should, I'll add it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r383457866
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Subtotals.java
 ##########
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.rel;
+
+
+import it.unimi.dsi.fastutil.ints.IntList;
+import org.apache.druid.query.dimension.DimensionSpec;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+/**
+ * Represents the Druid groupBy query concept of subtotals, which is similar to GROUPING SETS.
+ */
+public class Subtotals
+{
+  /**
+   * List of subtotals: each one is a list of dimension indexes. (i.e. [0, 1] means use the first and second
+   * dimensions).
+   */
+  private final List<IntList> subtotals;
+
+  Subtotals(List<IntList> subtotals)
+  {
+    this.subtotals = subtotals;
+  }
+
+  public List<IntList> getSubtotals()
+  {
+    return subtotals;
+  }
+
+  @Nullable
+  public List<List<String>> toSubtotalsSpec(final List<DimensionSpec> dimensions)
+  {
+    if (hasEffect(dimensions)) {
+      return subtotals.stream()
+                      .map(
+                          subtotalInts -> {
+                            final List<String> subtotalDimensionNames = new ArrayList<>();
+                            for (int dimIndex : subtotalInts) {
+                              subtotalDimensionNames.add(dimensions.get(dimIndex).getOutputName());
+                            }
+                            return subtotalDimensionNames;
+                          }
+                      )
+                      .collect(Collectors.toList());
+    } else {
+      return null;
+    }
+  }
+
+  /**
+   * Returns whether this subtotals spec has an effect, and cannot be ignored.
+   */
+  public boolean hasEffect(final List<DimensionSpec> dimensionSpecs)
+  {
+    if (subtotals.isEmpty() || (subtotals.size() == 1 && subtotals.get(0).size() == dimensionSpecs.size())) {
+      return false;
+    } else {
+      return true;
+    }
+  }
+
+  @Override
+  public boolean equals(Object o)
+  {
+    if (this == o) {
+      return true;
+    }
+    if (o == null || getClass() != o.getClass()) {
+      return false;
+    }
+    Subtotals subtotals1 = (Subtotals) o;
+    return subtotals.equals(subtotals1.subtotals);
+  }
+
+  @Override
+  public int hashCode()
+  {
+    return Objects.hash(subtotals);
+  }
 
 Review comment:
   equals and hashCode for `List<List>` can be expensive. Is this concerning? Should we create a memoized string that represents the lists  so that the check can be faster? Or is that overkill?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on issue #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on issue #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#issuecomment-591077520
 
 
   @gianm looks good. Thank you

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm merged pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm merged pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384129305
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/groupby/strategy/GroupByStrategyV2.java
 ##########
 @@ -369,13 +374,13 @@ public boolean doMergeResults(final GroupByQuery query)
                    .map(AggregatorFactory::getCombiningFactory)
                    .collect(Collectors.toList())
           )
-          .withSubtotalsSpec(null)
-          .withDimFilter(null);
-
+          .withVirtualColumns(VirtualColumns.EMPTY)
+          .withDimFilter(null)
+          .withSubtotalsSpec(null);
 
 Review comment:
   Sorry, I should have phrased that better.
   
   I meant that there seems to be a lot of overlap between the `.with...` functions and the `GroupByQuery.Builder` functions. When I saw that pattern, I was wondering how do you choose between the 2. And since both exist, I may not pick the correct one.
   
   Nothing to change here - just my stream of thought while reading the code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r383428087
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/groupby/GroupByQuery.java
 ##########
 @@ -1236,7 +1243,8 @@ public int hashCode()
         dimFilter,
         dimensions,
         aggregatorSpecs,
-        postAggregatorSpecs
+        postAggregatorSpecs,
+        subtotalsSpec
 
 Review comment:
   EqualsVerifier Test for this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r383431491
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -65,7 +65,7 @@ Druid SQL supports SELECT queries with the following structure:
 SELECT [ ALL | DISTINCT ] { * | exprs }
 FROM table
 [ WHERE expr ]
-[ GROUP BY exprs ]
+[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ]
 
 Review comment:
   This is cool...  I learnt some new SQL today 👍 
   
   ```suggestion
   [ GROUP BY [ exprs | GROUPING SETS ( \(exprs\), ... ) | ROLLUP \(exprs\) | CUBE \(exprs\) ] ]
   ```
   
   Based on the examples below, the parenthesis are required correct?
   
   I think in other places in the docs parenthesis don't mean literal characters, but you want them to be literal characters here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on issue #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on issue #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#issuecomment-591597665
 
 
   The followup: https://github.com/apache/druid/issues/9410

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384135121
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -86,6 +86,22 @@ trigger an aggregation query using one of Druid's [three native aggregation quer
 can refer to an expression or a select clause ordinal position (like `GROUP BY 2` to group by the second selected
 column).
 
+The GROUP BY clause can also refer to multiple grouping sets in three ways. The most flexible is GROUP BY GROUPING SETS,
+for example `GROUP BY GROUPING SETS ( (country, city), () )`. This example is equivalent to a `GROUP BY country, city`
+followed by `GROUP BY ()` (a grand total). With GROUPING SETS, the underlying data is only scanned one time, leading to
+better efficiency. Second, GROUP BY ROLLUP computes a grouping set for each level of the grouping expressions. For
+example `GROUP BY ROLLUP (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), () )`
+and will produce grouped rows for each country / city pair, along with subtotals for each country, along with a grand
+total. Finally, GROUP BY CUBE computes a grouping set for each combination of grouping expressions. For example,
+`GROUP BY CUBE (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), (city), () )`.
+Grouping columns that do not apply to a particular row will contain `NULL`. For example, when computing
+`GROUP BY GROUPING SETS ( (country, city), () )`, the grand total row corresponding to `()` will have `NULL` for the
+"country" and "city" columns.
+
+When using GROUP BY GROUPING SETS, GROUP BY ROLLUP, or GROUP BY CUBE, be aware that results may not be generated in the
 
 Review comment:
   No need to change if this is an existing pattern

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384126724
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Subtotals.java
 ##########
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.rel;
+
+
+import it.unimi.dsi.fastutil.ints.IntList;
+import org.apache.druid.query.dimension.DimensionSpec;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+/**
+ * Represents the Druid groupBy query concept of subtotals, which is similar to GROUPING SETS.
+ */
+public class Subtotals
+{
+  /**
+   * List of subtotals: each one is a list of dimension indexes. (i.e. [0, 1] means use the first and second
+   * dimensions).
+   */
+  private final List<IntList> subtotals;
+
+  Subtotals(List<IntList> subtotals)
+  {
+    this.subtotals = subtotals;
+  }
+
+  public List<IntList> getSubtotals()
+  {
+    return subtotals;
+  }
+
+  @Nullable
+  public List<List<String>> toSubtotalsSpec(final List<DimensionSpec> dimensions)
+  {
+    if (hasEffect(dimensions)) {
+      return subtotals.stream()
+                      .map(
+                          subtotalInts -> {
+                            final List<String> subtotalDimensionNames = new ArrayList<>();
+                            for (int dimIndex : subtotalInts) {
+                              subtotalDimensionNames.add(dimensions.get(dimIndex).getOutputName());
+                            }
+                            return subtotalDimensionNames;
+                          }
+                      )
+                      .collect(Collectors.toList());
+    } else {
+      return null;
+    }
+  }
+
+  /**
+   * Returns whether this subtotals spec has an effect, and cannot be ignored.
+   */
+  public boolean hasEffect(final List<DimensionSpec> dimensionSpecs)
+  {
+    if (subtotals.isEmpty() || (subtotals.size() == 1 && subtotals.get(0).size() == dimensionSpecs.size())) {
+      return false;
+    } else {
+      return true;
+    }
+  }
+
+  @Override
+  public boolean equals(Object o)
+  {
+    if (this == o) {
+      return true;
+    }
+    if (o == null || getClass() != o.getClass()) {
+      return false;
+    }
+    Subtotals subtotals1 = (Subtotals) o;
+    return subtotals.equals(subtotals1.subtotals);
+  }
+
+  @Override
+  public int hashCode()
+  {
+    return Objects.hash(subtotals);
+  }
 
 Review comment:
   👍 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] a2l007 commented on issue #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

a2l007 commented on issue #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#issuecomment-590535288
 
 
   > However, I can't recall if that usecase
   did have needs for limitSpec. @a2l007 do you know if there is dependency of limit/sort spec behavior when subtotalsSpecs are used ?
   
   @himanshug The initial usecases for subtotalsSpec didn't require any sorting on the individual grouping sets. These usecases don't use limitSpec at all so it should be okay for this behavior change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384086213
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/groupby/strategy/GroupByStrategyV2.java
 ##########
 @@ -369,13 +374,13 @@ public boolean doMergeResults(final GroupByQuery query)
                    .map(AggregatorFactory::getCombiningFactory)
                    .collect(Collectors.toList())
           )
-          .withSubtotalsSpec(null)
-          .withDimFilter(null);
-
+          .withVirtualColumns(VirtualColumns.EMPTY)
+          .withDimFilter(null)
+          .withSubtotalsSpec(null);
 
 Review comment:
   What do you mean by 'make sense'? It makes sense to me in that you have an immutable object (the GroupByQuery) and you are using chaining to create a derived copy.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384134632
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -65,7 +65,7 @@ Druid SQL supports SELECT queries with the following structure:
 SELECT [ ALL | DISTINCT ] { * | exprs }
 FROM table
 [ WHERE expr ]
-[ GROUP BY exprs ]
+[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ]
 
 Review comment:
   I was thrown off by this in the docs 
   
   `[ EXPLAIN PLAN FOR ]
   [ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]`
   
   I didn't realize the parenthesis were required in this statement . I couldn't get an EXPLAIN query to work locally for me in the UI.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] ccaominh commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

ccaominh commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384192373
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Grouping.java
 ##########
 @@ -141,36 +159,95 @@ public RowSignature getOutputRowSignature()
     return outputRowSignature;
   }
 
+  /**
+   * Applies a post-grouping projection.
+   *
+   * @see DruidQuery#computeGrouping which uses this
+   */
+  public Grouping applyProject(final PlannerContext plannerContext, final Project project)
+  {
+    final List<DimensionExpression> newDimensions = new ArrayList<>();
+    final List<Aggregation> newAggregations = new ArrayList<>(aggregations);
+    final Subtotals newSubtotals;
+
+    final Projection postAggregationProjection = Projection.postAggregation(
+        project,
+        plannerContext,
+        outputRowSignature,
+        "p"
+    );
+
+    postAggregationProjection.getPostAggregators().forEach(
+        postAggregator -> newAggregations.add(Aggregation.create(postAggregator))
+    );
+
+    // Remove literal dimensions that did not appear in the projection. This is useful for queries
+    // like "SELECT COUNT(*) FROM tbl GROUP BY 'dummy'" which some tools can generate, and for which we don't
+    // actually want to include a dimension 'dummy'.
+    final ImmutableBitSet aggregateProjectBits = RelOptUtil.InputFinder.bits(project.getChildExps(), null);
+    final int[] newDimIndexes = new int[dimensions.size()];
+
+    for (int i = 0; i < dimensions.size(); i++) {
+      final DimensionExpression dimension = dimensions.get(i);
+      if (Parser.parse(dimension.getDruidExpression().getExpression(), plannerContext.getExprMacroTable())
+                .isLiteral() && !aggregateProjectBits.get(i)) {
+        newDimIndexes[i] = -1;
+      } else {
+        newDimIndexes[i] = newDimensions.size();
+        newDimensions.add(dimension);
+      }
+    }
+
+    // Renumber subtotals, if needed, to account for removed dummy dimensions.
+    if (newDimensions.size() != dimensions.size()) {
+      final List<IntList> newSubtotalsList = new ArrayList<>();
+
+      for (IntList subtotal : subtotals.getSubtotals()) {
+        final IntList newSubtotal = new IntArrayList();
+        for (int dimIndex : subtotal) {
+          final int newDimIndex = newDimIndexes[dimIndex];
+          if (newDimIndex >= 0) {
+            newSubtotal.add(newDimIndex);
+          }
+        }
+
+        newSubtotalsList.add(newSubtotal);
+      }
+
+      newSubtotals = new Subtotals(newSubtotalsList);
+    } else {
+      newSubtotals = subtotals;
+    }
+
+    return Grouping.create(
+        newDimensions,
+        newSubtotals,
+        newAggregations,
+        havingFilter,
+        postAggregationProjection.getOutputRowSignature()
+    );
+  }
+
   @Override
-  public boolean equals(final Object o)
+  public boolean equals(Object o)
 
 Review comment:
   Is there a test for this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384085215
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -86,6 +86,22 @@ trigger an aggregation query using one of Druid's [three native aggregation quer
 can refer to an expression or a select clause ordinal position (like `GROUP BY 2` to group by the second selected
 column).
 
+The GROUP BY clause can also refer to multiple grouping sets in three ways. The most flexible is GROUP BY GROUPING SETS,
+for example `GROUP BY GROUPING SETS ( (country, city), () )`. This example is equivalent to a `GROUP BY country, city`
+followed by `GROUP BY ()` (a grand total). With GROUPING SETS, the underlying data is only scanned one time, leading to
+better efficiency. Second, GROUP BY ROLLUP computes a grouping set for each level of the grouping expressions. For
+example `GROUP BY ROLLUP (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), () )`
+and will produce grouped rows for each country / city pair, along with subtotals for each country, along with a grand
+total. Finally, GROUP BY CUBE computes a grouping set for each combination of grouping expressions. For example,
+`GROUP BY CUBE (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), (city), () )`.
+Grouping columns that do not apply to a particular row will contain `NULL`. For example, when computing
+`GROUP BY GROUPING SETS ( (country, city), () )`, the grand total row corresponding to `()` will have `NULL` for the
+"country" and "city" columns.
+
+When using GROUP BY GROUPING SETS, GROUP BY ROLLUP, or GROUP BY CUBE, be aware that results may not be generated in the
 
 Review comment:
   Other paragraphs in this section use plain text for simple keywords and preformatted text for examples; see https://druid.apache.org/docs/latest/querying/sql.html.
   
   This can be changed but I wanted to keep it consistent in this patch.
   
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on issue #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on issue #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#issuecomment-570666851
 
 
   @himanshug would you be able to comment on this part? Do you think it's reasonable to make this change?
   
   > Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to
   > each grouping set. This is more in line with SQL standard behavior. I think it is okay
   > to make this change, since the old behavior was not documented, so users should
   > hopefully not be depending on it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384092390
 
 

 ##########
 File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
 ##########
 @@ -9579,6 +9600,538 @@ public void testGroupByTimeAndOtherDimension() throws Exception
     );
   }
 
+  @Test
+  public void testGroupingSets() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (dim2, gran), (dim2), (gran), () )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v1"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithNumericDimension() throws Exception
+  {
+    testQuery(
+        "SELECT cnt, COUNT(*)\n"
+        + "FROM foo\n"
+        + "GROUP BY GROUPING SETS ( (cnt), () )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setDimensions(dimensions(new DefaultDimensionSpec("cnt", "d0", ValueType.LONG)))
+                        .setAggregatorSpecs(aggregators(new CountAggregatorFactory("a0")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("d0"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{1L, 6L},
+            new Object[]{null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupByRollup() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY ROLLUP (dim2, gran)",
 
 Review comment:
   OK, I'll add it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384086779
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
 ##########
 @@ -706,7 +722,9 @@ private Query computeQuery()
   @Nullable
   public TimeseriesQuery toTimeseriesQuery()
   {
-    if (grouping == null || grouping.getHavingFilter() != null) {
+    if (grouping == null
+        || grouping.getSubtotals().hasEffect(grouping.getDimensionSpecs())
+        || grouping.getHavingFilter() != null) {
 
 Review comment:
   It's probably going to be an exceedingly small effect, I'd like to leave it this way if you don't mind.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384126844
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
 ##########
 @@ -706,7 +722,9 @@ private Query computeQuery()
   @Nullable
   public TimeseriesQuery toTimeseriesQuery()
   {
-    if (grouping == null || grouping.getHavingFilter() != null) {
+    if (grouping == null
+        || grouping.getSubtotals().hasEffect(grouping.getDimensionSpecs())
+        || grouping.getHavingFilter() != null) {
 
 Review comment:
   👍 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on issue #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on issue #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#issuecomment-591040754
 
 
   @suneet-s I just pushed some changes reflecting a couple of your comments. Please let me know what you think about the others.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] himanshug commented on issue #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

himanshug commented on issue #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#issuecomment-591720856
 
 
   > I think we can handle this through the release notes. I'll make a followup issue to add GROUPING and GROUPING_ID, which I think we will need to make this feature fully useful.
   
   sure, thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r383429231
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -86,6 +86,22 @@ trigger an aggregation query using one of Druid's [three native aggregation quer
 can refer to an expression or a select clause ordinal position (like `GROUP BY 2` to group by the second selected
 column).
 
+The GROUP BY clause can also refer to multiple grouping sets in three ways. The most flexible is GROUP BY GROUPING SETS,
+for example `GROUP BY GROUPING SETS ( (country, city), () )`. This example is equivalent to a `GROUP BY country, city`
+followed by `GROUP BY ()` (a grand total). With GROUPING SETS, the underlying data is only scanned one time, leading to
+better efficiency. Second, GROUP BY ROLLUP computes a grouping set for each level of the grouping expressions. For
+example `GROUP BY ROLLUP (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), () )`
+and will produce grouped rows for each country / city pair, along with subtotals for each country, along with a grand
+total. Finally, GROUP BY CUBE computes a grouping set for each combination of grouping expressions. For example,
+`GROUP BY CUBE (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), (city), () )`.
+Grouping columns that do not apply to a particular row will contain `NULL`. For example, when computing
+`GROUP BY GROUPING SETS ( (country, city), () )`, the grand total row corresponding to `()` will have `NULL` for the
+"country" and "city" columns.
+
+When using GROUP BY GROUPING SETS, GROUP BY ROLLUP, or GROUP BY CUBE, be aware that results may not be generated in the
 
 Review comment:
   ```suggestion
   When using `GROUP BY GROUPING SETS`, `GROUP BY ROLLUP`, or `GROUP BY CUBE`, be aware that results may not be generated in the
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on issue #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on issue #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#issuecomment-591528953
 
 
   > I'm fine with whatever you prefer as it is confirmed that behavior is not being used. Added `Release Notes` tag(updated description as well) to this PR so that this gets mentioned in release notes and other users of subtotalsSpec can get a chance shout out when this goes in a release-candidate.
   
   I think we can handle this through the release notes. I'll make a followup issue to add GROUPING and GROUPING_ID, which I think we will need to make this feature fully useful.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384134632
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -65,7 +65,7 @@ Druid SQL supports SELECT queries with the following structure:
 SELECT [ ALL | DISTINCT ] { * | exprs }
 FROM table
 [ WHERE expr ]
-[ GROUP BY exprs ]
+[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ]
 
 Review comment:
   I was thrown off by this in the docs 
   
   `[ EXPLAIN PLAN FOR ]
   [ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]`
   
   I didn't realize the parenthesis were required in this statement . I couldn't get an EXPLAIN query to work locally for me.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] sthetland commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

sthetland commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384101696
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -86,6 +86,22 @@ trigger an aggregation query using one of Druid's [three native aggregation quer
 can refer to an expression or a select clause ordinal position (like `GROUP BY 2` to group by the second selected
 column).
 
+The GROUP BY clause can also refer to multiple grouping sets in three ways. The most flexible is GROUP BY GROUPING SETS,
+for example `GROUP BY GROUPING SETS ( (country, city), () )`. This example is equivalent to a `GROUP BY country, city`
+followed by `GROUP BY ()` (a grand total). With GROUPING SETS, the underlying data is only scanned one time, leading to
+better efficiency. Second, GROUP BY ROLLUP computes a grouping set for each level of the grouping expressions. For
+example `GROUP BY ROLLUP (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), () )`
+and will produce grouped rows for each country / city pair, along with subtotals for each country, along with a grand
+total. Finally, GROUP BY CUBE computes a grouping set for each combination of grouping expressions. For example,
+`GROUP BY CUBE (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), (city), () )`.
+Grouping columns that do not apply to a particular row will contain `NULL`. For example, when computing
+`GROUP BY GROUPING SETS ( (country, city), () )`, the grand total row corresponding to `()` will have `NULL` for the
+"country" and "city" columns.
+
+When using GROUP BY GROUPING SETS, GROUP BY ROLLUP, or GROUP BY CUBE, be aware that results may not be generated in the
 
 Review comment:
   I think we can keep to the prevailing convention of plain text for inline keywords, with all caps providing the formatting distinction. 
   
   The underlying point is taken though—a series of keywords like this makes the text a little hard to scan.  One possible workaround, if the problem warrants it, would be to restructure the sentence to put the keywords in bullet format. For example --
   
   > When using the following, be aware that results may not be generated in the order that you specify your grouping sets in the query:
   > 
   > - GROUP BY GROUPING SETS 
   > - GROUP BY ROLLUP
   > - GROUP BY CUBE
   > 
   > If you need results... "
   > 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384085713
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -65,7 +65,7 @@ Druid SQL supports SELECT queries with the following structure:
 SELECT [ ALL | DISTINCT ] { * | exprs }
 FROM table
 [ WHERE expr ]
-[ GROUP BY exprs ]
+[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ]
 
 Review comment:
   The parentheses mean literal characters here and everywhere else they appear in this example. The brackets, however, don't.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384087401
 
 

 ##########
 File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
 ##########
 @@ -9579,6 +9600,538 @@ public void testGroupByTimeAndOtherDimension() throws Exception
     );
   }
 
+  @Test
+  public void testGroupingSets() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (dim2, gran), (dim2), (gran), () )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v1"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithNumericDimension() throws Exception
+  {
+    testQuery(
+        "SELECT cnt, COUNT(*)\n"
+        + "FROM foo\n"
+        + "GROUP BY GROUPING SETS ( (cnt), () )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setDimensions(dimensions(new DefaultDimensionSpec("cnt", "d0", ValueType.LONG)))
+                        .setAggregatorSpecs(aggregators(new CountAggregatorFactory("a0")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("d0"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{1L, 6L},
+            new Object[]{null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupByRollup() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY ROLLUP (dim2, gran)",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v1"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupByCube() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY CUBE (dim2, gran)",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v1"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithDummyDimension() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (dim2, 'dummy', gran), (dim2), (gran), ('dummy') )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v2",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v2", "v2", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v2"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of(),
+                                ImmutableList.of("v2")
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, null, 6L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsNoSuperset() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    // Note: the grouping sets are reordered in the output of this query, but this is allowed.
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithOrderByDimension() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )\n"
+        + "ORDER BY gran, dim2 DESC",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setLimitSpec(
+                            new DefaultLimitSpec(
+                                ImmutableList.of(
+                                    new OrderByColumnSpec(
+                                        "v1",
+                                        Direction.ASCENDING,
+                                        StringComparators.NUMERIC
+                                    ),
+                                    new OrderByColumnSpec(
+                                        "v0",
+                                        Direction.DESCENDING,
+                                        StringComparators.LEXICOGRAPHIC
+                                    )
+                                ),
+                                Integer.MAX_VALUE
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"abc", null, 1L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"", null, 3L},
+            new Object[]{NULL_VALUE, null, 6L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithOrderByAggregator() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )\n"
+        + "ORDER BY SUM(cnt)\n",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setLimitSpec(
+                            new DefaultLimitSpec(
+                                ImmutableList.of(
+                                    new OrderByColumnSpec(
+                                        "a0",
+                                        Direction.ASCENDING,
+                                        StringComparators.NUMERIC
+                                    )
+                                ),
+                                Integer.MAX_VALUE
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"abc", null, 1L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"", null, 3L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithOrderByAggregatorWithLimit() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )\n"
+        + "ORDER BY SUM(cnt)\n"
+        + "LIMIT 1",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setLimitSpec(
+                            new DefaultLimitSpec(
+                                ImmutableList.of(
+                                    new OrderByColumnSpec(
+                                        "a0",
+                                        Direction.ASCENDING,
+                                        StringComparators.NUMERIC
+                                    )
+                                ),
+                                1
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"abc", null, 1L}
+        )
+    );
+  }
+
 
 Review comment:
   Are you referring to a specific test case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on issue #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on issue #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#issuecomment-590652648
 
 
   > If a limitSpec (containing sort spec) is applied in the end then that will mixup the rows and it would be difficult to identify which rows came from overall or which subtotalSpec . Oracle appears to support GROUPING function (Ref: https://docs.oracle.com/cd/B28359_01/server.111/b28314/tdpdw_sql.htm#TDPDW00714 ) to solve that problem in addition to having the behavior you described from `ROLLUP` and `CUBE` functions by adding '1' for dims not present in subtotal spec for rows created by subtotalSpec.
   > For native query users, what would be the alternative to achieve similar effect? For this reason, it might be good to have a feature flag for enabling `old` behavior .
   
   I think we will have to add something similar to `GROUPING` (and `GROUPING_ID`: https://docs.microsoft.com/en-us/sql/t-sql/functions/grouping-id-transact-sql?view=sql-server-ver15). I'm not sure what the best way is to do this. Maybe model it as an expression function, that people could use in a postaggregator or a having filter? But then we'd need some way of providing the current-grouping-set context to postaggregators and having filters somehow. I guess there's no way around that. Native query users would use the new function directly, and SQL users would get it through `GROUPING` and `GROUPING_ID`.
   
   I'd like to make that addition, but probably not in this patch. For now, you think I should add the 'legacy' flag or do you think I should document that things will get jumbled if you have a limitSpec? Personally, I'd rather do the latter, if people don't have use cases that depend on the legacy limitSpec behavior.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r383441586
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/groupby/strategy/GroupByStrategyV2.java
 ##########
 @@ -369,13 +374,13 @@ public boolean doMergeResults(final GroupByQuery query)
                    .map(AggregatorFactory::getCombiningFactory)
                    .collect(Collectors.toList())
           )
-          .withSubtotalsSpec(null)
-          .withDimFilter(null);
-
+          .withVirtualColumns(VirtualColumns.EMPTY)
+          .withDimFilter(null)
+          .withSubtotalsSpec(null);
 
 Review comment:
   not really part of this change - maybe worth filing an issue for. The `with...` interface returns a query object, so chaining doesn't really make sense here. Maybe we should consider making that return the `Builder` so callers can call `.build()` when they're done setting all the fields. In this example it looks like it creates 4 intermediate GroupByQyery objects and a Builder for each object

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384213572
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Grouping.java
 ##########
 @@ -141,36 +159,95 @@ public RowSignature getOutputRowSignature()
     return outputRowSignature;
   }
 
+  /**
+   * Applies a post-grouping projection.
+   *
+   * @see DruidQuery#computeGrouping which uses this
+   */
+  public Grouping applyProject(final PlannerContext plannerContext, final Project project)
+  {
+    final List<DimensionExpression> newDimensions = new ArrayList<>();
+    final List<Aggregation> newAggregations = new ArrayList<>(aggregations);
+    final Subtotals newSubtotals;
+
+    final Projection postAggregationProjection = Projection.postAggregation(
+        project,
+        plannerContext,
+        outputRowSignature,
+        "p"
+    );
+
+    postAggregationProjection.getPostAggregators().forEach(
+        postAggregator -> newAggregations.add(Aggregation.create(postAggregator))
+    );
+
+    // Remove literal dimensions that did not appear in the projection. This is useful for queries
+    // like "SELECT COUNT(*) FROM tbl GROUP BY 'dummy'" which some tools can generate, and for which we don't
+    // actually want to include a dimension 'dummy'.
+    final ImmutableBitSet aggregateProjectBits = RelOptUtil.InputFinder.bits(project.getChildExps(), null);
+    final int[] newDimIndexes = new int[dimensions.size()];
+
+    for (int i = 0; i < dimensions.size(); i++) {
+      final DimensionExpression dimension = dimensions.get(i);
+      if (Parser.parse(dimension.getDruidExpression().getExpression(), plannerContext.getExprMacroTable())
+                .isLiteral() && !aggregateProjectBits.get(i)) {
+        newDimIndexes[i] = -1;
+      } else {
+        newDimIndexes[i] = newDimensions.size();
+        newDimensions.add(dimension);
+      }
+    }
+
+    // Renumber subtotals, if needed, to account for removed dummy dimensions.
+    if (newDimensions.size() != dimensions.size()) {
+      final List<IntList> newSubtotalsList = new ArrayList<>();
+
+      for (IntList subtotal : subtotals.getSubtotals()) {
+        final IntList newSubtotal = new IntArrayList();
+        for (int dimIndex : subtotal) {
+          final int newDimIndex = newDimIndexes[dimIndex];
+          if (newDimIndex >= 0) {
+            newSubtotal.add(newDimIndex);
+          }
+        }
+
+        newSubtotalsList.add(newSubtotal);
+      }
+
+      newSubtotals = new Subtotals(newSubtotalsList);
+    } else {
+      newSubtotals = subtotals;
+    }
+
+    return Grouping.create(
+        newDimensions,
+        newSubtotals,
+        newAggregations,
+        havingFilter,
+        postAggregationProjection.getOutputRowSignature()
+    );
+  }
+
   @Override
-  public boolean equals(final Object o)
+  public boolean equals(Object o)
 
 Review comment:
   I'll add one.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r383453509
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
 ##########
 @@ -706,7 +722,9 @@ private Query computeQuery()
   @Nullable
   public TimeseriesQuery toTimeseriesQuery()
   {
-    if (grouping == null || grouping.getHavingFilter() != null) {
+    if (grouping == null
+        || grouping.getSubtotals().hasEffect(grouping.getDimensionSpecs())
+        || grouping.getHavingFilter() != null) {
 
 Review comment:
   nit: it's cheaper to check if `havingFilter != null` before checking `hasEffect`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r383461885
 
 

 ##########
 File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
 ##########
 @@ -9579,6 +9600,538 @@ public void testGroupByTimeAndOtherDimension() throws Exception
     );
   }
 
+  @Test
+  public void testGroupingSets() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (dim2, gran), (dim2), (gran), () )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v1"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithNumericDimension() throws Exception
+  {
+    testQuery(
+        "SELECT cnt, COUNT(*)\n"
+        + "FROM foo\n"
+        + "GROUP BY GROUPING SETS ( (cnt), () )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setDimensions(dimensions(new DefaultDimensionSpec("cnt", "d0", ValueType.LONG)))
+                        .setAggregatorSpecs(aggregators(new CountAggregatorFactory("a0")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("d0"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{1L, 6L},
+            new Object[]{null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupByRollup() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY ROLLUP (dim2, gran)",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v1"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupByCube() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY CUBE (dim2, gran)",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v1"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithDummyDimension() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (dim2, 'dummy', gran), (dim2), (gran), ('dummy') )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v2",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v2", "v2", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v2"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of(),
+                                ImmutableList.of("v2")
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, null, 6L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsNoSuperset() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    // Note: the grouping sets are reordered in the output of this query, but this is allowed.
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithOrderByDimension() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )\n"
+        + "ORDER BY gran, dim2 DESC",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setLimitSpec(
+                            new DefaultLimitSpec(
+                                ImmutableList.of(
+                                    new OrderByColumnSpec(
+                                        "v1",
+                                        Direction.ASCENDING,
+                                        StringComparators.NUMERIC
+                                    ),
+                                    new OrderByColumnSpec(
+                                        "v0",
+                                        Direction.DESCENDING,
+                                        StringComparators.LEXICOGRAPHIC
+                                    )
+                                ),
+                                Integer.MAX_VALUE
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"abc", null, 1L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"", null, 3L},
+            new Object[]{NULL_VALUE, null, 6L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithOrderByAggregator() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )\n"
+        + "ORDER BY SUM(cnt)\n",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setLimitSpec(
+                            new DefaultLimitSpec(
+                                ImmutableList.of(
+                                    new OrderByColumnSpec(
+                                        "a0",
+                                        Direction.ASCENDING,
+                                        StringComparators.NUMERIC
+                                    )
+                                ),
+                                Integer.MAX_VALUE
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"abc", null, 1L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"", null, 3L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithOrderByAggregatorWithLimit() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (), (dim2), (gran) )\n"
+        + "ORDER BY SUM(cnt)\n"
+        + "LIMIT 1",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setLimitSpec(
+                            new DefaultLimitSpec(
+                                ImmutableList.of(
+                                    new OrderByColumnSpec(
+                                        "a0",
+                                        Direction.ASCENDING,
+                                        StringComparators.NUMERIC
+                                    )
+                                ),
+                                1
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"abc", null, 1L}
+        )
+    );
+  }
+
 
 Review comment:
   👍 nice tests!
   
   Is `GROUPING SETS ( (dummy) )` === `GROUPING SETS ( () )` or should it fail?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r383462950
 
 

 ##########
 File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
 ##########
 @@ -9579,6 +9600,538 @@ public void testGroupByTimeAndOtherDimension() throws Exception
     );
   }
 
+  @Test
+  public void testGroupingSets() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY GROUPING SETS ( (dim2, gran), (dim2), (gran), () )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setVirtualColumns(
+                            expressionVirtualColumn(
+                                "v0",
+                                "case_searched(notnull(\"dim2\"),\"dim2\",'')",
+                                ValueType.STRING
+                            ),
+                            expressionVirtualColumn(
+                                "v1",
+                                "timestamp_floor(\"__time\",'P1M',null,'UTC')",
+                                ValueType.LONG
+                            )
+                        )
+                        .setDimensions(
+                            dimensions(
+                                new DefaultDimensionSpec("v0", "v0"),
+                                new DefaultDimensionSpec("v1", "v1", ValueType.LONG)
+                            )
+                        )
+                        .setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("v0", "v1"),
+                                ImmutableList.of("v0"),
+                                ImmutableList.of("v1"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{"", timestamp("2000-01-01"), 2L},
+            new Object[]{"", timestamp("2001-01-01"), 1L},
+            new Object[]{"a", timestamp("2000-01-01"), 1L},
+            new Object[]{"a", timestamp("2001-01-01"), 1L},
+            new Object[]{"abc", timestamp("2001-01-01"), 1L},
+            new Object[]{"", null, 3L},
+            new Object[]{"a", null, 2L},
+            new Object[]{"abc", null, 1L},
+            new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L},
+            new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L},
+            new Object[]{NULL_VALUE, null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupingSetsWithNumericDimension() throws Exception
+  {
+    testQuery(
+        "SELECT cnt, COUNT(*)\n"
+        + "FROM foo\n"
+        + "GROUP BY GROUPING SETS ( (cnt), () )",
+        ImmutableList.of(
+            GroupByQuery.builder()
+                        .setDataSource(CalciteTests.DATASOURCE1)
+                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setGranularity(Granularities.ALL)
+                        .setDimensions(dimensions(new DefaultDimensionSpec("cnt", "d0", ValueType.LONG)))
+                        .setAggregatorSpecs(aggregators(new CountAggregatorFactory("a0")))
+                        .setSubtotalsSpec(
+                            ImmutableList.of(
+                                ImmutableList.of("d0"),
+                                ImmutableList.of()
+                            )
+                        )
+                        .setContext(QUERY_CONTEXT_DEFAULT)
+                        .build()
+        ),
+        ImmutableList.of(
+            new Object[]{1L, 6L},
+            new Object[]{null, 6L}
+        )
+    );
+  }
+
+  @Test
+  public void testGroupByRollup() throws Exception
+  {
+    // Cannot vectorize due to virtual columns.
+    cannotVectorize();
+
+    testQuery(
+        "SELECT dim2, gran, SUM(cnt)\n"
+        + "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n"
+        + "GROUP BY ROLLUP (dim2, gran)",
 
 Review comment:
   Maybe add a test for `ROLLUP (gran, dim2)` ie reverse order of the select columns. Not sure if we need the same thing for `CUBE` since it generates the same grouping sets with the reverse order.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] ccaominh commented on a change in pull request #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

ccaominh commented on a change in pull request #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#discussion_r384190521
 
 

 ##########
 File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Grouping.java
 ##########
 @@ -92,19 +104,25 @@ private Grouping(
 
   public static Grouping create(
       final List<DimensionExpression> dimensions,
+      final Subtotals subtotals,
       final List<Aggregation> aggregations,
       final DimFilter havingFilter,
       final RowSignature outputRowSignature
   )
   {
-    return new Grouping(dimensions, aggregations, havingFilter, outputRowSignature);
+    return new Grouping(dimensions, subtotals, aggregations, havingFilter, outputRowSignature);
 
 Review comment:
   Should `havingFilter` be annotated with `@Nullable` for this method as well?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] himanshug commented on issue #9122: Add SQL GROUPING SETS support.

Posted by GitBox <gi...@apache.org>.

himanshug commented on issue #9122: Add SQL GROUPING SETS support.
URL: https://github.com/apache/druid/pull/9122#issuecomment-589828549

> Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to
each grouping set. This is more in line with SQL standard behavior. I think it is okay
to make this change, since the old behavior was not documented, so users should
hopefully not be depending on it.

saw this today, I think, with original behavior, I tried to replicate semantics equivalent of sending separate queries for each subtotalsSpec individual (pre subtotalsSpec world) . That was the usecase at hand when subtotalsSpec feature was created. However, I can't recall if that usecase
did have needs for limitSpec. @a2l007 do you know if there is dependency of limit/sort spec behavior when subtotalsSpecs are used ?
If a limitSpec (containing sort spec) is applied in the end then that will mixup the rows and it would be difficult to identify which rows came from overall or which subtotalSpec . Oracle appears to support GROUPING function (Ref: https://docs.oracle.com/cd/B28359_01/server.111/b28314/tdpdw_sql.htm#TDPDW00714 ) to solve that problem in addition to having the behavior you described from `ROLLUP` and `CUBE` functions by adding '1' for dims not present in subtotal spec for rows created by subtotalSpec.
For native query users, what would be the alternative to achieve similar effect? For this reason, it might be good to have a feature flag for enabling `old` behavior .

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org