You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by icexelloss <gi...@git.apache.org> on 2018/01/02 15:30:38 UTC

[GitHub] spark pull request #19872: [SPARK-22274][PySpark] User-defined aggregation f...

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19872#discussion_r159248298
  
    --- Diff: python/pyspark/sql/group.py ---
    @@ -82,6 +91,13 @@ def agg(self, *exprs):
             >>> from pyspark.sql import functions as F
             >>> sorted(gdf.agg(F.min(df.age)).collect())
             [Row(name=u'Alice', min(age)=2), Row(name=u'Bob', min(age)=5)]
    +
    +        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    +        >>> @pandas_udf('int', PandasUDFType.GROUP_AGG)
    +        ... def min_udf(v):
    +        ...     return v.min()
    +        >>> sorted(gdf.agg(min_udf(df.age)).collect())  # doctest: +SKIP
    --- End diff --
    
    I don't know a good way of skipping doctest when pyarrow is not available... If others have some ideas, please let me know


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org