You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Somasundaram Sekar <so...@tigeranalytics.com> on 2017/10/07 17:12:17 UTC

DataFrame multiple agg on the same column

Hi,

I have a GroupedData object, on which I perform aggregation of few columns
since GroupedData takes in map, I cannot perform multiple aggregate on the
same column, say I want to have both max and min of amount.

So the below line of code will return only one aggregate per column

grouped_txn.agg({'*' : 'count', 'amount' : 'sum', 'amount' : 'max',
'created_time' : 'min', 'created_time' : 'max'})

What are the possible alternatives, I can have a new column defined, that
is just a copy of the original and use that, but that looks ugly any
suggestions?

Thanks,
Somasundaram S

Re: DataFrame multiple agg on the same column

Posted by yohann jardin <yo...@hotmail.com>.
Hey Somasundaram,

Using a map is only one way to use the function agg. For the complete list: https://spark.apache.org/docs/1.5.2/api/java/org/apache/spark/sql/GroupedData.html

Using the first one: agg<https://spark.apache.org/docs/1.5.2/api/java/org/apache/spark/sql/GroupedData.html#agg%28org.apache.spark.sql.Column,%20org.apache.spark.sql.Column...%29>(Column<https://spark.apache.org/docs/1.5.2/api/java/org/apache/spark/sql/Column.html> expr, Column<https://spark.apache.org/docs/1.5.2/api/java/org/apache/spark/sql/Column.html>... exprs)
grouped_txn.agg(count(lit(1)), sum('amount), max('amount), min('create_time), max('created_time)).show

Yohann Jardin

Le 10/7/2017 à 7:12 PM, Somasundaram Sekar a écrit :
Hi,

I have a GroupedData object, on which I perform aggregation of few columns since GroupedData takes in map, I cannot perform multiple aggregate on the same column, say I want to have both max and min of amount.

So the below line of code will return only one aggregate per column

grouped_txn.agg({'*' : 'count', 'amount' : 'sum', 'amount' : 'max', 'created_time' : 'min', 'created_time' : 'max'})

What are the possible alternatives, I can have a new column defined, that is just a copy of the original and use that, but that looks ugly any suggestions?

Thanks,
Somasundaram S