You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Priyanka Garg (JIRA)" <ji...@apache.org> on 2016/06/07 04:33:20 UTC

[jira] [Created] (SPARK-15797) To expose groupingSets for DataFrame

Priyanka Garg created SPARK-15797:
-------------------------------------

             Summary: To expose groupingSets for DataFrame
                 Key: SPARK-15797
                 URL: https://issues.apache.org/jira/browse/SPARK-15797
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 1.5.1
            Reporter: Priyanka Garg


Currently, Cube and rollup functions are exposed in data frame but not grouping sets. 
For eg.
df.rollup($"department", $"group", $designation).avg() results into 
a. All combinations of department , group and designations
b. All combinations of department , group , taking designation as null
c. All departments , taking groups and designation as null
d. taking department and group both null ( means aggregating on the complete data)

On the same lines , there should be a function grouping sets , in which custom groupings can be specified.
For eg.
df.groupingSets(($"department", $"group", $"designation"), ($"group") ,($"designation"), () ).avg() 
This should result into:
1. All combinations of department, group and designation
2. All values of group taking department and designation as null
3. All  values of designation, taking department and group as null.
4. Aggregation on complete data i.e. taking designation, group and department as null.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org