You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maciej Szymkiewicz (JIRA)" <ji...@apache.org> on 2017/01/10 20:34:58 UTC

[jira] [Created] (SPARK-19161) Improving UDF Docstrings

Maciej Szymkiewicz created SPARK-19161:
------------------------------------------

             Summary: Improving UDF Docstrings
                 Key: SPARK-19161
                 URL: https://issues.apache.org/jira/browse/SPARK-19161
             Project: Spark
          Issue Type: Sub-task
          Components: PySpark, SQL
    Affects Versions: 2.1.0, 2.0.0, 1.6.0, 1.5.0, 2.2.0
            Reporter: Maciej Szymkiewicz


Current state

Right now `udf` returns an `UserDefinedFunction` object which doesn't provide meaningful docstring:

{code}
In [1]: from pyspark.sql.types import IntegerType

In [2]: from pyspark.sql.functions import udf

In [3]: def _add_one(x):
        """Adds one"""
        if x is not None:
                return x + 1
   ...:     

In [4]: add_one = udf(_add_one, IntegerType())

In [5]: ?add_one
Type:        UserDefinedFunction
String form: <pyspark.sql.functions.UserDefinedFunction object at 0x7f281ed2d198>
File:        ~/Spark/spark-2.0/python/pyspark/sql/functions.py
Signature:   add_one(*cols)
Docstring:
User defined function in Python

.. versionadded:: 1.3

In [6]: help(add_one)

Help on UserDefinedFunction in module pyspark.sql.functions object:

class UserDefinedFunction(builtins.object)
 |  User defined function in Python
 |  
 |  .. versionadded:: 1.3
 |  
 |  Methods defined here:
 |  
 |  __call__(self, *cols)
 |      Call self as a function.
 |  
 |  __del__(self)
 |  
 |  __init__(self, func, returnType, name=None)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
(END)

{code}

It is possible to extract the function:


{code}
In [7]: ?add_one.func

Signature: add_one.func(x)
Docstring: Adds one
File:      ~/Spark/spark-2.0/<ipython-input-3-d2d8e4c530ac>
Type:      function

In [8]: help(add_one.func)

Help on function _add_one in module __main__:

_add_one(x)
    Adds one
{code}

but it assumes that the final user is aware of the distinction between UDF and built-in functions.

Proposed

Copy input functions docstring to the UDF object or function wrapper. 

{code}
In [1]: from pyspark.sql.types import IntegerType

In [2]: from pyspark.sql.functions import udf

In [3]: def _add_one(x):
        """Adds one"""
        if x is not None:
                return x + 1
   ...:    

In [4]: add_one = udf(_add_one, IntegerType())

In [5]: ?add_one
Signature: add_one(x)
Docstring:
Adds one

SQL Type: IntegerType
File:      ~/Workspace/spark/<ipython-input-3-d2d8e4c530ac>
Type:      function

In [6]: help(add_one)


Help on function _add_one in module __main__:

_add_one(x)
    Adds one
    
    SQL Type: IntegerType
(END)

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org