You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Li Jin (JIRA)" <ji...@apache.org> on 2019/07/17 14:25:00 UTC
[jira] [Created] (SPARK-28422) GROUPED_AGG pandas_udf doesn't with
spark.sql without group by clause
Li Jin created SPARK-28422:
------------------------------
Summary: GROUPED_AGG pandas_udf doesn't with spark.sql without group by clause
Key: SPARK-28422
URL: https://issues.apache.org/jira/browse/SPARK-28422
Project: Spark
Issue Type: Bug
Components: PySpark, SQL
Affects Versions: 2.4.3
Reporter: Li Jin
{code:java}
@pandas_udf('double', PandasUDFType.GROUPED_AGG)
def max_udf(v):
return v.max()
df = spark.range(0, 100)
df.udf.register('max_udf', max_udf)
df.createTempView('table')
# A. This works
df.agg(max_udf(df['id'])).show()
# B. This doesn't work
spark.sql("select max_udf(id) from table"){code}
Query plan:
A:
{code:java}
== Parsed Logical Plan ==
'Aggregate [max_udf('id) AS max_udf(id)#140]
+- Range (0, 1000, step=1, splits=Some(4))
== Analyzed Logical Plan ==
max_udf(id): double
Aggregate [max_udf(id#64L) AS max_udf(id)#140]
+- Range (0, 1000, step=1, splits=Some(4))
== Optimized Logical Plan ==
Aggregate [max_udf(id#64L) AS max_udf(id)#140]
+- Range (0, 1000, step=1, splits=Some(4))
== Physical Plan ==
!AggregateInPandas [max_udf(id#64L)], [max_udf(id)#138 AS max_udf(id)#140]
+- Exchange SinglePartition
+- *(1) Range (0, 1000, step=1, splits=4)
{code}
B:
{code:java}
== Parsed Logical Plan ==
'Project [unresolvedalias('max_udf('id), None)]
+- 'UnresolvedRelation [table]
== Analyzed Logical Plan ==
max_udf(id): double
Project [max_udf(id#0L) AS max_udf(id)#136]
+- SubqueryAlias `table`
+- Range (0, 100, step=1, splits=Some(4))
== Optimized Logical Plan ==
Project [max_udf(id#0L) AS max_udf(id)#136]
+- Range (0, 100, step=1, splits=Some(4))
== Physical Plan ==
*(1) Project [max_udf(id#0L) AS max_udf(id)#136]
+- *(1) Range (0, 100, step=1, splits=4)
{code}
Maybe related to subquery?
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org