You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Priyanka Garg (JIRA)" <ji...@apache.org> on 2016/06/07 04:33:20 UTC

[jira] [Created] (SPARK-15797) To expose groupingSets for DataFrame

Priyanka Garg created SPARK-15797:
-------------------------------------

Summary: To expose groupingSets for DataFrame
Key: SPARK-15797
URL: https://issues.apache.org/jira/browse/SPARK-15797
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 1.5.1
Reporter: Priyanka Garg

Currently, Cube and rollup functions are exposed in data frame but not grouping sets.
For eg.
df.rollup($"department", $"group", $designation).avg() results into
a. All combinations of department , group and designations
b. All combinations of department , group , taking designation as null
c. All departments , taking groups and designation as null
d. taking department and group both null ( means aggregating on the complete data)

On the same lines , there should be a function grouping sets , in which custom groupings can be specified.
For eg.
df.groupingSets(($"department", $"group", $"designation"), ($"group") ,($"designation"), () ).avg()
This should result into:
1. All combinations of department, group and designation
2. All values of group taking department and designation as null
3. All values of designation, taking department and group as null.
4. Aggregation on complete data i.e. taking designation, group and department as null.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org