You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Herman van Hövell (Jira)" <ji...@apache.org> on 2023/02/25 02:36:00 UTC

[jira] [Created] (SPARK-42576) Add 2nd groupBy method to Dataset

Herman van Hövell created SPARK-42576:
-----------------------------------------

             Summary: Add 2nd groupBy method to Dataset
                 Key: SPARK-42576
                 URL: https://issues.apache.org/jira/browse/SPARK-42576
             Project: Spark
          Issue Type: New Feature
          Components: Connect
    Affects Versions: 3.4.0
            Reporter: Herman van Hövell


Dataset is missing a groupBy method:
{code:java}
/**
 * Groups the Dataset using the specified columns, so that we can run aggregation on them.
 * See [[RelationalGroupedDataset]] for all the available aggregate functions.
 *
 * This is a variant of groupBy that can only group by existing columns using column names
 * (i.e. cannot construct expressions).
 *
 * {{{
 *   // Compute the average for all numeric columns grouped by department.
 *   ds.groupBy("department").avg()
 *
 *   // Compute the max age and average salary, grouped by department and gender.
 *   ds.groupBy($"department", $"gender").agg(Map(
 *     "salary" -> "avg",
 *     "age" -> "max"
 *   ))
 * }}}
 * @group untypedrel
 * @since 3.4.0
 */
@scala.annotation.varargs
def groupBy(col1: String, cols: String*): RelationalGroupedDataset {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org