You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Herman van Hövell (Jira)" <ji...@apache.org> on 2023/02/25 02:36:00 UTC
[jira] [Updated] (SPARK-42576) Add s2nd groupBy method to Dataset
[ https://issues.apache.org/jira/browse/SPARK-42576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Herman van Hövell updated SPARK-42576:
--------------------------------------
Summary: Add s2nd groupBy method to Dataset (was: Add 2nd groupBy method to Dataset)
> Add s2nd groupBy method to Dataset
> ----------------------------------
>
> Key: SPARK-42576
> URL: https://issues.apache.org/jira/browse/SPARK-42576
> Project: Spark
> Issue Type: New Feature
> Components: Connect
> Affects Versions: 3.4.0
> Reporter: Herman van Hövell
> Priority: Major
>
> Dataset is missing a groupBy method:
> {code:java}
> /**
> * Groups the Dataset using the specified columns, so that we can run aggregation on them.
> * See [[RelationalGroupedDataset]] for all the available aggregate functions.
> *
> * This is a variant of groupBy that can only group by existing columns using column names
> * (i.e. cannot construct expressions).
> *
> * {{{
> * // Compute the average for all numeric columns grouped by department.
> * ds.groupBy("department").avg()
> *
> * // Compute the max age and average salary, grouped by department and gender.
> * ds.groupBy($"department", $"gender").agg(Map(
> * "salary" -> "avg",
> * "age" -> "max"
> * ))
> * }}}
> * @group untypedrel
> * @since 3.4.0
> */
> @scala.annotation.varargs
> def groupBy(col1: String, cols: String*): RelationalGroupedDataset {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org