You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by icexelloss <gi...@git.apache.org> on 2018/01/02 15:30:38 UTC
[GitHub] spark pull request #19872: [SPARK-22274][PySpark] User-defined aggregation f...
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r159248298
--- Diff: python/pyspark/sql/group.py ---
@@ -82,6 +91,13 @@ def agg(self, *exprs):
>>> from pyspark.sql import functions as F
>>> sorted(gdf.agg(F.min(df.age)).collect())
[Row(name=u'Alice', min(age)=2), Row(name=u'Bob', min(age)=5)]
+
+ >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
+ >>> @pandas_udf('int', PandasUDFType.GROUP_AGG)
+ ... def min_udf(v):
+ ... return v.min()
+ >>> sorted(gdf.agg(min_udf(df.age)).collect()) # doctest: +SKIP
--- End diff --
I don't know a good way of skipping doctest when pyarrow is not available... If others have some ideas, please let me know
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org