You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/03 14:05:12 UTC

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30144: [SPARK-33229][SQL] Support GROUP BY use Separate columns and CUBE/ROLLUP

AngersZhuuuu commented on a change in pull request #30144:
URL: https://github.com/apache/spark/pull/30144#discussion_r515926187



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -3691,6 +3691,32 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
       checkAnswer(sql("SELECT id FROM t WHERE (SELECT true)"), Row(0L))
     }
   }
+
+  test("SPARK-33229: Support GROUP BY use Separate columns and CUBE/ROLLUP") {
+    withTable("t") {
+      sql("CREATE TABLE t USING PARQUET AS SELECT id AS a, id AS b, id AS c FROM range(1)")
+      checkAnswer(sql("SELECT a, b, c, count(*) FROM t GROUP BY CUBE(a, b, c)"),
+        Row(0, 0, 0, 1) :: Row(0, 0, null, 1) ::
+          Row(0, null, 0, 1) :: Row(0, null, null, 1) ::
+          Row(null, 0, 0, 1) :: Row(null, 0, null, 1) ::
+          Row(null, null, 0, 1) :: Row(null, null, null, 1) :: Nil)
+      checkAnswer(sql("SELECT a, b, c, count(*) FROM t GROUP BY a, CUBE(b, c)"),

Review comment:
       > what's the semantic of it?
   
   If we want some dimensional analysis group by `a` and  different dimensional about combine `b` & `c`, in current we need to write 
   `group by cube(a, b, c)` and `where a !=NULL` to remove interfering data, with this patch we can just write 
   
   ```
   group by a, cube(b, c)
   ```
   And this set of PR can make  Grouping Analytics more flexible as Postgres SQL.  And we do have this need for analysis。
   
   

##########
File path: sql/core/src/test/resources/sql-tests/inputs/group-analytics.sql
##########
@@ -59,4 +59,12 @@ SELECT course, year FROM courseSales GROUP BY CUBE(course, year) ORDER BY groupi
 -- Aliases in SELECT could be used in ROLLUP/CUBE/GROUPING SETS
 SELECT a + b AS k1, b AS k2, SUM(a - b) FROM testData GROUP BY CUBE(k1, k2);
 SELECT a + b AS k, b, SUM(a - b) FROM testData GROUP BY ROLLUP(k, b);
-SELECT a + b, b AS k, SUM(a - b) FROM testData GROUP BY a + b, k GROUPING SETS(k)
+SELECT a + b, b AS k, SUM(a - b) FROM testData GROUP BY a + b, k GROUPING SETS(k);
+
+-- GROUP BY use mixed Separate columns and CUBE/ROLLUP
+SELECT a, b, count(1) FROM testData GROUP BY a, b, CUBE(a, b);
+SELECT a, b, count(1) FROM testData GROUP BY a, b, ROLLUP(a, b);
+SELECT a, b, count(1) FROM testData GROUP BY CUBE(a, b), ROLLUP(a, b);
+SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(b);
+SELECT a, b, count(1) FROM testData GROUP BY CUBE(a, b), ROLLUP(a, b) GROUPING SETS(a, b);

Review comment:
       This is not unsupported, but it can be fixed easy after refactor GROUPING ANALYTICS




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org