You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/05/12 07:24:00 UTC

[jira] [Commented] (SPARK-35346) More clause needed for combining groupby and cube

    [ https://issues.apache.org/jira/browse/SPARK-35346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343066#comment-17343066 ] 

Hyukjin Kwon commented on SPARK-35346:
--------------------------------------

It would be greatly helpful if you add some more references of DBMSes that support "group by xxx, xxx, cube(xxx,xxx)".

> More clause needed for combining groupby and cube
> -------------------------------------------------
>
>                 Key: SPARK-35346
>                 URL: https://issues.apache.org/jira/browse/SPARK-35346
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.0.0, 3.0.2, 3.1.1
>            Reporter: Kai
>            Priority: Major
>
> As we all know, aggregation clause must follow after groupby, rollup or cube clause in pyspark. I think we should have more features in this part. Because in sql, we can write it like this "group by xxx, xxx, cube(xxx,xxx)". While in pyspark, if you just need cube for one field and group for the others, it's not gonna happen. Using cube for all fields brings much more cost for useless data. So I think we need to improve it. Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org