You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Herman van Hovell (JIRA)" <ji...@apache.org> on 2016/11/22 16:13:58 UTC
[jira] [Closed] (SPARK-18358) Multiple Aggregation Using
'countDistinct' and 'first' result in error
[ https://issues.apache.org/jira/browse/SPARK-18358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Herman van Hovell closed SPARK-18358.
-------------------------------------
Resolution: Duplicate
Fix Version/s: 2.0.2
> Multiple Aggregation Using 'countDistinct' and 'first' result in error
> -----------------------------------------------------------------------
>
> Key: SPARK-18358
> URL: https://issues.apache.org/jira/browse/SPARK-18358
> Project: Spark
> Issue Type: Bug
> Environment: Mac OS X 10.9.5
> Apache Spark 2.0.1
> Hadoop 1.4
> Reporter: Chris Nasrallah
> Fix For: 2.0.2
>
>
> Using pyspark, when I attempt to perform multiple aggregations on the same groupBy object using the functions 'first' and 'countDistinct' it results in a Py4JJavaError.
> {code:borderStyle=solid}
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as sfn
> sparkSession = SparkSession.builder.master('local').getOrCreate()
> df = spark.createDataFrame([
> (1, 'a', 'z'),
> (1, 'b', 'x'),
> (1, 'a', 'y'),
> (1, 'a', 'x'),
> (2, 'b', 'z'),
> (2, 'b', 'z')
> ], ['id', 'var1', 'var2'])
> ## Using two 'first' and one 'countDistinct' aggregations works
> df.groupby('id') \
> .agg(sfn.first('var1'), \
> sfn.first('var2'), \
> sfn.countDistinct('var1')).show()
>
> ## Using one 'max' with both 'countDistinct' works:
> df.groupby('id') \
> .agg(sfn.max('var2'), \
> sfn.countDistinct('var1'), \
> sfn.countDistinct('var2')).show()
> ## But using both 'countDistinct' with at least one 'first' crashes
> df.groupby('id') \
> .agg(sfn.first('var1'), \
> sfn.first('var2'), \
> sfn.countDistinct('var1'), \
> sfn.countDistinct('var2')) \
> .show()
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org