You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/10/07 15:17:28 UTC

[GitHub] spark pull request #22610: [SPARK-25461][PySpark][SQL] Add document for mism...

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22610#discussion_r223217249
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2909,6 +2909,12 @@ def pandas_udf(f=None, returnType=None, functionType=None):
             can fail on special rows, the workaround is to incorporate the condition into the functions.
     
         .. note:: The user-defined functions do not take keyword arguments on the calling side.
    +
    +    .. note:: The data type of returned `pandas.Series` from the user-defined functions should be
    +        matched with defined returnType (see :meth:`types.to_arrow_type` and
    +        :meth:`types.from_arrow_type`). When there is mismatch between them, Spark might do
    +        conversion on returned data. The conversion is not guaranteed to be correct and results
    +        should be checked for accuracy by users.
    --- End diff --
    
    I am merging this since this describes the current status but let's make it clear and try to get rid of this note within 3.0.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org