You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maciej Szymkiewicz (JIRA)" <ji...@apache.org> on 2017/01/10 20:34:58 UTC
[jira] [Created] (SPARK-19161) Improving UDF Docstrings
Maciej Szymkiewicz created SPARK-19161:
------------------------------------------
Summary: Improving UDF Docstrings
Key: SPARK-19161
URL: https://issues.apache.org/jira/browse/SPARK-19161
Project: Spark
Issue Type: Sub-task
Components: PySpark, SQL
Affects Versions: 2.1.0, 2.0.0, 1.6.0, 1.5.0, 2.2.0
Reporter: Maciej Szymkiewicz
Current state
Right now `udf` returns an `UserDefinedFunction` object which doesn't provide meaningful docstring:
{code}
In [1]: from pyspark.sql.types import IntegerType
In [2]: from pyspark.sql.functions import udf
In [3]: def _add_one(x):
"""Adds one"""
if x is not None:
return x + 1
...:
In [4]: add_one = udf(_add_one, IntegerType())
In [5]: ?add_one
Type: UserDefinedFunction
String form: <pyspark.sql.functions.UserDefinedFunction object at 0x7f281ed2d198>
File: ~/Spark/spark-2.0/python/pyspark/sql/functions.py
Signature: add_one(*cols)
Docstring:
User defined function in Python
.. versionadded:: 1.3
In [6]: help(add_one)
Help on UserDefinedFunction in module pyspark.sql.functions object:
class UserDefinedFunction(builtins.object)
| User defined function in Python
|
| .. versionadded:: 1.3
|
| Methods defined here:
|
| __call__(self, *cols)
| Call self as a function.
|
| __del__(self)
|
| __init__(self, func, returnType, name=None)
| Initialize self. See help(type(self)) for accurate signature.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
(END)
{code}
It is possible to extract the function:
{code}
In [7]: ?add_one.func
Signature: add_one.func(x)
Docstring: Adds one
File: ~/Spark/spark-2.0/<ipython-input-3-d2d8e4c530ac>
Type: function
In [8]: help(add_one.func)
Help on function _add_one in module __main__:
_add_one(x)
Adds one
{code}
but it assumes that the final user is aware of the distinction between UDF and built-in functions.
Proposed
Copy input functions docstring to the UDF object or function wrapper.
{code}
In [1]: from pyspark.sql.types import IntegerType
In [2]: from pyspark.sql.functions import udf
In [3]: def _add_one(x):
"""Adds one"""
if x is not None:
return x + 1
...:
In [4]: add_one = udf(_add_one, IntegerType())
In [5]: ?add_one
Signature: add_one(x)
Docstring:
Adds one
SQL Type: IntegerType
File: ~/Workspace/spark/<ipython-input-3-d2d8e4c530ac>
Type: function
In [6]: help(add_one)
Help on function _add_one in module __main__:
_add_one(x)
Adds one
SQL Type: IntegerType
(END)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org