You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Herman van Hövell (Jira)" <ji...@apache.org> on 2023/02/25 02:36:00 UTC
[jira] [Created] (SPARK-42576) Add 2nd groupBy method to Dataset
Herman van Hövell created SPARK-42576:
-----------------------------------------
Summary: Add 2nd groupBy method to Dataset
Key: SPARK-42576
URL: https://issues.apache.org/jira/browse/SPARK-42576
Project: Spark
Issue Type: New Feature
Components: Connect
Affects Versions: 3.4.0
Reporter: Herman van Hövell
Dataset is missing a groupBy method:
{code:java}
/**
* Groups the Dataset using the specified columns, so that we can run aggregation on them.
* See [[RelationalGroupedDataset]] for all the available aggregate functions.
*
* This is a variant of groupBy that can only group by existing columns using column names
* (i.e. cannot construct expressions).
*
* {{{
* // Compute the average for all numeric columns grouped by department.
* ds.groupBy("department").avg()
*
* // Compute the max age and average salary, grouped by department and gender.
* ds.groupBy($"department", $"gender").agg(Map(
* "salary" -> "avg",
* "age" -> "max"
* ))
* }}}
* @group untypedrel
* @since 3.4.0
*/
@scala.annotation.varargs
def groupBy(col1: String, cols: String*): RelationalGroupedDataset {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org