You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/01/02 15:40:31 UTC

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/20137

    [SPARK-22939] [PySpark] Support Spark UDF in registerFunction [WIP]

    ## What changes were proposed in this pull request?
    ```Python
    import random
    from pyspark.sql.functions import udf
    from pyspark.sql.types import IntegerType, StringType
    random_udf = udf(lambda: int(random.random() * 100), IntegerType()).asNondeterministic()
    spark.catalog.registerFunction("random_udf", random_udf, StringType())
    spark.sql("SELECT random_udf()").collect()
    ```
    
    We will get the following error.
    ```
    Py4JError: An error occurred while calling o29.__getnewargs__. Trace:
    py4j.Py4JException: Method __getnewargs__([]) does not exist
    	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    	at py4j.Gateway.invoke(Gateway.java:274)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:214)
    	at java.lang.Thread.run(Thread.java:745)
    ```
    
    This PR is to support it. 
    
    ## How was this patch tested?
    WIP

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark registerFunction

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20137.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20137
    
----
commit 8216b6bb52082883fc9b212cd9ab21227f2b8491
Author: gatorsmile <ga...@...>
Date:   2018-01-02T15:28:19Z

    wip

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159445253
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -378,6 +378,23 @@ def test_udf2(self):
             [res] = self.spark.sql("SELECT strlen(a) FROM test WHERE strlen(a) > 1").collect()
             self.assertEqual(4, res[0])
     
    +    def test_non_deterministic_udf(self):
    +        import random
    +        from pyspark.sql.functions import udf
    +        random_udf = udf(lambda: random.randint(6, 6), IntegerType()).asNondeterministic()
    +        self.assertEqual(random_udf.deterministic, False)
    +        random_udf1 = self.spark.catalog.registerFunction("randInt", random_udf, StringType())
    +        self.assertEqual(random_udf1.deterministic, False)
    +        [row] = self.spark.sql("SELECT randInt()").collect()
    +        self.assertEqual(row[0], "6")
    +        [row] = self.spark.range(1).select(random_udf1()).collect()
    +        self.assertEqual(row[0], "6")
    +        [row] = self.spark.range(1).select(random_udf()).collect()
    +        self.assertEqual(row[0], 6)
    +        pydoc.render_doc(udf(lambda: random.randint(6, 6), IntegerType()))
    --- End diff --
    
    what does it do?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85618/testReport)** for PR 20137 at commit [`f099261`](https://github.com/apache/spark/commit/f0992610854b95e4f1b9964bdf5c62132fd52c93).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    With this diff:
    
    ```diff
    --- a/python/pyspark/sql/udf.py
    +++ b/python/pyspark/sql/udf.py
    @@ -173,4 +173,4 @@ class UserDefinedFunction(object):
             .. versionadded:: 2.3
             """
             self._deterministic = False
    -        return self
    +        return self._wrapped()
    ```
    
    **After**
    
    ```python
    from pyspark.sql.functions import udf
    help(udf(lambda: 1, "integer").asNondeterministic())
    ```
    
    ```
    Help on function <lambda> in module __main__:
    
    <lambda> lambda *args
    (END)
    ```
    
    ```python
    from pyspark.sql.functions import udf
    help(udf(lambda: 1, "integer"))
    ```
    
    ```
    Help on function <lambda> in module __main__:
    
    <lambda> lambda *args
    (END)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85619/testReport)** for PR 20137 at commit [`6ac25e6`](https://github.com/apache/spark/commit/6ac25e67bc345b35525b99d2e8659bb9554a0422).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159550091
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -130,14 +133,17 @@ def _create_judf(self):
             wrapped_func = _wrap_function(sc, self.func, self.returnType)
             jdt = spark._jsparkSession.parseDataType(self.returnType.json())
             judf = sc._jvm.org.apache.spark.sql.execution.python.UserDefinedPythonFunction(
    -            self._name, wrapped_func, jdt, self.evalType, self._deterministic)
    +            self._name, wrapped_func, jdt, self.evalType, self.deterministic)
             return judf
     
         def __call__(self, *cols):
             judf = self._judf
             sc = SparkContext._active_spark_context
             return Column(judf.apply(_to_seq(sc, cols, _to_java_column)))
     
    +    # This function is for improving the online help system in the interactive interpreter.
    +    # For example, the built-in help / pydoc.help. It wraps the UDF with the docstring and
    +    # argument annotation. (See: SPARK-19161)
    --- End diff --
    
    I do not want to expose these comments to the doc.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Hey @gatorsmile, I was just looking into this now. How about we have `_unwrapped` for wrapped function and then we return wrapped function from wrapped function and `UserDefinedFunction` from `UserDefinedFunction`, for example, roughly, in `udf.py`?:
    
    ```diff
             wrapper.returnType = self.returnType
             wrapper.evalType = self.evalType
    -        wrapper.asNondeterministic = self.asNondeterministic
    +        wrapper.asNondeterministic = lambda: self.asNondeterministic._wrapped()
    +        wrapper._unwrapped = lambda: self
             return wrapper
    ```
    
    and then we do?
    
    ```python
    if hasattr(f, "_unwrapped"):
        f = f._unwrapped()
    if isinstance(f, UserDefinedFunction):
        udf = UserDefinedFunction(f.func, returnType=returnType, name=name,
                                  evalType=PythonEvalType.SQL_BATCHED_UDF)
        udf = udf if (f._deterministic) else udf.asNondeterministic()
    else:
        # Existing logics.
    ```
    
    Retruning `UserDefinedFunction` from wrapped function by `asNondeterministic` seems actually an issue because it breaks pydoc, for example,
    
    ```python
    from pyspark.sql.functions import udf
    help(udf(lambda: 1, "integer").asNondeterministic())
    ```
    
    I haven't tested the suggestion above but I think this is going to roughly work fine and resolve two issues too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159599247
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction("random_udf", random_udf, StringType())
    +        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'82')]
    +        >>> spark.range(1).select(newRandom_udf()).collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'62')]
             """
    -        udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                  evalType=PythonEvalType.SQL_BATCHED_UDF)
    +
    +        # This is to check whether the input function is a wrapped/native UserDefinedFunction
    +        if hasattr(f, 'asNondeterministic'):
    +            udf = UserDefinedFunction(f.func, returnType=returnType, name=name,
    +                                      evalType=PythonEvalType.SQL_BATCHED_UDF,
    --- End diff --
    
    +1 but I think there's no way to use a group map UDF in SQL syntax if I understood correctly. I think we can safely fail fast for now as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85657/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159582570
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction("random_udf", random_udf, StringType())
    +        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'82')]
    +        >>> spark.range(1).select(newRandom_udf()).collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'62')]
             """
    -        udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                  evalType=PythonEvalType.SQL_BATCHED_UDF)
    +
    +        # This is to check whether the input function is a wrapped/native UserDefinedFunction
    +        if hasattr(f, 'asNondeterministic'):
    +            udf = UserDefinedFunction(f.func, returnType=returnType, name=name,
    +                                      evalType=PythonEvalType.SQL_BATCHED_UDF,
    --- End diff --
    
    cc @ueshin @icexelloss , shall we support register pandas UDF here too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159506133
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -130,14 +133,17 @@ def _create_judf(self):
             wrapped_func = _wrap_function(sc, self.func, self.returnType)
             jdt = spark._jsparkSession.parseDataType(self.returnType.json())
             judf = sc._jvm.org.apache.spark.sql.execution.python.UserDefinedPythonFunction(
    -            self._name, wrapped_func, jdt, self.evalType, self._deterministic)
    +            self._name, wrapped_func, jdt, self.evalType, self.deterministic)
             return judf
     
         def __call__(self, *cols):
             judf = self._judf
             sc = SparkContext._active_spark_context
             return Column(judf.apply(_to_seq(sc, cols, _to_java_column)))
     
    +    # This function is for improving the online help system in the interactive interpreter.
    +    # For example, the built-in help / pydoc.help. It wraps the UDF with the docstring and
    +    # argument annotation. (See: SPARK-19161)
    --- End diff --
    
    I think we can put this in the docstring of `_wrapped` between L148 and 150L.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85619/testReport)** for PR 20137 at commit [`6ac25e6`](https://github.com/apache/spark/commit/6ac25e67bc345b35525b99d2e8659bb9554a0422).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85616/testReport)** for PR 20137 at commit [`35e6a4a`](https://github.com/apache/spark/commit/35e6a4a5ba2750c4bd4c4bcb3d91f16e6ba1fdea).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85618/testReport)** for PR 20137 at commit [`f099261`](https://github.com/apache/spark/commit/f0992610854b95e4f1b9964bdf5c62132fd52c93).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159580038
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -227,15 +227,15 @@ def dropGlobalTempView(self, viewName):
         @ignore_unicode_prefix
         @since(2.0)
         def registerFunction(self, name, f, returnType=StringType()):
    -        """Registers a python function (including lambda function) as a UDF
    +        """Registers a Python function (including lambda function) or a wrapped/native UDF
    --- End diff --
    
    SGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85636/testReport)** for PR 20137 at commit [`78e9b2c`](https://github.com/apache/spark/commit/78e9b2c96204412e78ea1e50c95d52ffd6239228).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85617/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159551517
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -162,7 +168,8 @@ def wrapper(*args):
             wrapper.func = self.func
             wrapper.returnType = self.returnType
             wrapper.evalType = self.evalType
    -        wrapper.asNondeterministic = self.asNondeterministic
    +        wrapper.deterministic = self.deterministic
    +        wrapper.asNondeterministic = lambda: self.asNondeterministic()._wrapped()
    --- End diff --
    
    good to know the difference


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159647399
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction("random_udf", random_udf, StringType())
    +        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'82')]
    +        >>> spark.range(1).select(newRandom_udf()).collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'62')]
             """
    -        udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                  evalType=PythonEvalType.SQL_BATCHED_UDF)
    +
    +        # This is to check whether the input function is a wrapped/native UserDefinedFunction
    +        if hasattr(f, 'asNondeterministic'):
    +            udf = UserDefinedFunction(f.func, returnType=returnType, name=name,
    +                                      evalType=PythonEvalType.SQL_BATCHED_UDF,
    --- End diff --
    
    Will support the pandas UDF as a separate PR. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85625/testReport)** for PR 20137 at commit [`85f11bf`](https://github.com/apache/spark/commit/85f11bfbfb564acb670097ff4ce520bfbc79b855).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85625/testReport)** for PR 20137 at commit [`85f11bf`](https://github.com/apache/spark/commit/85f11bfbfb564acb670097ff4ce520bfbc79b855).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85617/testReport)** for PR 20137 at commit [`3208136`](https://github.com/apache/spark/commit/320813638b710d26dcebfc004271397d7e76c43f).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85619/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159511213
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -378,6 +378,23 @@ def test_udf2(self):
             [res] = self.spark.sql("SELECT strlen(a) FROM test WHERE strlen(a) > 1").collect()
             self.assertEqual(4, res[0])
     
    +    def test_non_deterministic_udf(self):
    +        import random
    +        from pyspark.sql.functions import udf
    +        random_udf = udf(lambda: random.randint(6, 6), IntegerType()).asNondeterministic()
    +        self.assertEqual(random_udf.deterministic, False)
    +        random_udf1 = self.spark.catalog.registerFunction("randInt", random_udf, StringType())
    +        self.assertEqual(random_udf1.deterministic, False)
    +        [row] = self.spark.sql("SELECT randInt()").collect()
    +        self.assertEqual(row[0], "6")
    +        [row] = self.spark.range(1).select(random_udf1()).collect()
    +        self.assertEqual(row[0], "6")
    +        [row] = self.spark.range(1).select(random_udf()).collect()
    +        self.assertEqual(row[0], 6)
    +        pydoc.render_doc(udf(lambda: random.randint(6, 6), IntegerType()))
    --- End diff --
    
    Can we put this tests there or make this separate from `test_non_deterministic_udf`? Adding comments is also fine to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159577391
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -227,15 +227,15 @@ def dropGlobalTempView(self, viewName):
         @ignore_unicode_prefix
         @since(2.0)
         def registerFunction(self, name, f, returnType=StringType()):
    -        """Registers a python function (including lambda function) as a UDF
    +        """Registers a Python function (including lambda function) or a wrapped/native UDF
    --- End diff --
    
    I'm really confusing when reading this document, it would be much more clear to me if we can just say
    ```
    Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    ```
    This wrapping logic was added in https://github.com/apache/spark/pull/16534 , is it really worth?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85607/testReport)** for PR 20137 at commit [`8216b6b`](https://github.com/apache/spark/commit/8216b6bb52082883fc9b212cd9ab21227f2b8491).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    but if we do
    
    ```diff
    +        wrapper.asNondeterministic = lambda: self.asNondeterministic()._wrapped()
    ```
    
    I think it will still show a proper pydoc ..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85655/testReport)** for PR 20137 at commit [`09a1b89`](https://github.com/apache/spark/commit/09a1b89cd44349bcf67fd1214827608988787df6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Can you run the command?
    ```
    help(udf(lambda: 1, "integer").asNondeterministic())
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85616/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159549932
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction(
    +        ...     "random_udf", random_udf, StringType())  # doctest: +SKIP
    +        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'82')]
    +        >>> spark.range(1).select(newRandom_udf()).collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'62')]
             """
    -        udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                  evalType=PythonEvalType.SQL_BATCHED_UDF)
    +
    +        if hasattr(f, 'asNondeterministic'):
    --- End diff --
    
    will add a comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85607/testReport)** for PR 20137 at commit [`8216b6b`](https://github.com/apache/spark/commit/8216b6bb52082883fc9b212cd9ab21227f2b8491).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159448813
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -378,6 +378,23 @@ def test_udf2(self):
             [res] = self.spark.sql("SELECT strlen(a) FROM test WHERE strlen(a) > 1").collect()
             self.assertEqual(4, res[0])
     
    +    def test_non_deterministic_udf(self):
    +        import random
    +        from pyspark.sql.functions import udf
    +        random_udf = udf(lambda: random.randint(6, 6), IntegerType()).asNondeterministic()
    +        self.assertEqual(random_udf.deterministic, False)
    +        random_udf1 = self.spark.catalog.registerFunction("randInt", random_udf, StringType())
    +        self.assertEqual(random_udf1.deterministic, False)
    +        [row] = self.spark.sql("SELECT randInt()").collect()
    +        self.assertEqual(row[0], "6")
    +        [row] = self.spark.range(1).select(random_udf1()).collect()
    +        self.assertEqual(row[0], "6")
    +        [row] = self.spark.range(1).select(random_udf()).collect()
    +        self.assertEqual(row[0], 6)
    +        pydoc.render_doc(udf(lambda: random.randint(6, 6), IntegerType()))
    --- End diff --
    
    This is to test a help function. See https://github.com/gatorsmile/spark/blob/85f11bfbfb564acb670097ff4ce520bfbc79b855/python/pyspark/sql/tests.py#L1681-L1688


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Ur, @gatorsmile, then, we will return a wrapped function from `UserDefinedFunction(). asNondeterministic`. Mind if I ask to elaborate why? I thought `UserDefinedFunction` should still return `UserDefinedFunction`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85618/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85608/testReport)** for PR 20137 at commit [`e8d0a4c`](https://github.com/apache/spark/commit/e8d0a4c7c8c9e81fd420195d3cc1a37a3b8459a3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159551657
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85617/testReport)** for PR 20137 at commit [`3208136`](https://github.com/apache/spark/commit/320813638b710d26dcebfc004271397d7e76c43f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Take your time. I will not be online in the next two hours.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159506607
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction(
    +        ...     "random_udf", random_udf, StringType())  # doctest: +SKIP
    +        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'82')]
    +        >>> spark.range(1).select(newRandom_udf()).collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'62')]
             """
    -        udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                  evalType=PythonEvalType.SQL_BATCHED_UDF)
    +
    +        if hasattr(f, 'asNondeterministic'):
    --- End diff --
    
    Actually, this one made me to suggest `wrapper._unwrapped = lambda: self` way.
    
    So, here this can be wrapped function or `UserDefinedFunction` and I thought it's not quite clear what we expect here by `hasattr(f, 'asNondeterministic')`.
    
    Could we at least leave come comments saying that this can be both wrapped function for `UserDefinedFunction` and `UserDefinedFunction` itself?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159577480
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -162,7 +168,8 @@ def wrapper(*args):
             wrapper.func = self.func
             wrapper.returnType = self.returnType
             wrapper.evalType = self.evalType
    -        wrapper.asNondeterministic = self.asNondeterministic
    +        wrapper.deterministic = self.deterministic
    +        wrapper.asNondeterministic = lambda: self.asNondeterministic()._wrapped()
    --- End diff --
    
    Definitely. Will give a try within the following week tho ...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    I am not against anything, but the outputs of the following two are inconsistent. It looks confusing to end users.
    
    ```
    help(udf(lambda: 1, "integer").asNondeterministic())
    help(udf(lambda: 1, "integer"))
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85625/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Thanks! Merged to master and 2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    > BTW, this PR is not just for `asNondeterministic()`. We have the same issue for the deterministic UDFs.
    
    Yup, the fix for deterministic UDFs seem fine but the change about `asNondeterministic()` bugs me.
    
    If you meant the docstring about `asNondeterministic` itself (not the wrapped function instance as above),
    
    I think we can do the things below:
    
    ```python
           wrapper.asNondeterministic = functools.wraps(
               self.asNondeterministic)(lambda: self.asNondeterministic()._wrapped())
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Looks fine to me otherwise BTW.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by icexelloss <gi...@git.apache.org>.

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159693340
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction("random_udf", random_udf, StringType())
    +        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'82')]
    +        >>> spark.range(1).select(newRandom_udf()).collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'62')]
             """
    -        udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                  evalType=PythonEvalType.SQL_BATCHED_UDF)
    +
    +        # This is to check whether the input function is a wrapped/native UserDefinedFunction
    +        if hasattr(f, 'asNondeterministic'):
    +            udf = UserDefinedFunction(f.func, returnType=returnType, name=name,
    +                                      evalType=PythonEvalType.SQL_BATCHED_UDF,
    --- End diff --
    
    +1 too


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85655/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159509498
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -172,5 +179,5 @@ def asNondeterministic(self):
     
             .. versionadded:: 2.3
             """
    -        self._deterministic = False
    +        self.deterministic = False
    --- End diff --
    
    Can we call it `udfDeterministic` to be consistent with Scala side?
    
    https://github.com/apache/spark/blob/ff48b1b338241039a7189e7a3c04333b1256fdb3/sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala#L33
    
    The opposite works fine to me too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159448650
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction(
    +        ...     "random_udf", random_udf, StringType())  # doctest: +SKIP
    --- End diff --
    
    The output contains a hex value. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    cc @mgaido91 since you touched related codes lately.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159582683
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction("random_udf", random_udf, StringType())
    +        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'82')]
    +        >>> spark.range(1).select(newRandom_udf()).collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'62')]
             """
    -        udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                  evalType=PythonEvalType.SQL_BATCHED_UDF)
    +
    +        # This is to check whether the input function is a wrapped/native UserDefinedFunction
    +        if hasattr(f, 'asNondeterministic'):
    +            udf = UserDefinedFunction(f.func, returnType=returnType, name=name,
    +                                      evalType=PythonEvalType.SQL_BATCHED_UDF,
    --- End diff --
    
    seems we can support it by just changing `evalType=PythonEvalType.SQL_BATCHED_UDF` to `evalType=f.evalType`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159576756
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -162,7 +168,8 @@ def wrapper(*args):
             wrapper.func = self.func
             wrapper.returnType = self.returnType
             wrapper.evalType = self.evalType
    -        wrapper.asNondeterministic = self.asNondeterministic
    +        wrapper.deterministic = self.deterministic
    +        wrapper.asNondeterministic = lambda: self.asNondeterministic()._wrapped()
    --- End diff --
    
    I will leave this unchanged. Maybe you can submit a follow-up PR to address it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159600332
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction("random_udf", random_udf, StringType())
    +        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'82')]
    +        >>> spark.range(1).select(newRandom_udf()).collect()  # doctest: +SKIP
    +        [Row(random_udf()=u'62')]
             """
    -        udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                  evalType=PythonEvalType.SQL_BATCHED_UDF)
    +
    +        # This is to check whether the input function is a wrapped/native UserDefinedFunction
    +        if hasattr(f, 'asNondeterministic'):
    +            udf = UserDefinedFunction(f.func, returnType=returnType, name=name,
    +                                      evalType=PythonEvalType.SQL_BATCHED_UDF,
    --- End diff --
    
    SGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159579886
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -227,15 +227,15 @@ def dropGlobalTempView(self, viewName):
         @ignore_unicode_prefix
         @since(2.0)
         def registerFunction(self, name, f, returnType=StringType()):
    -        """Registers a python function (including lambda function) as a UDF
    +        """Registers a Python function (including lambda function) or a wrapped/native UDF
    --- End diff --
    
    BTW, to be honest, I remember I gave several quick tries to get rid of the wrapper but keep the docstring correctly at that time but I failed to make a good alternative.
    
    Might be good to try if there is a claver way to get rid of the wrapper but keep the doc.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159505249
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction(
    +        ...     "random_udf", random_udf, StringType())  # doctest: +SKIP
    --- End diff --
    
    BTW, I think we can remove `# doctest: +SKIP` for this line because this line simply assigns a value to  `newRandom_udf`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85636/testReport)** for PR 20137 at commit [`78e9b2c`](https://github.com/apache/spark/commit/78e9b2c96204412e78ea1e50c95d52ffd6239228).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85657 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85657/testReport)** for PR 20137 at commit [`2482e6b`](https://github.com/apache/spark/commit/2482e6bcdaf92a78ae6b043a859e10140a273a18).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159507510
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -162,7 +168,8 @@ def wrapper(*args):
             wrapper.func = self.func
             wrapper.returnType = self.returnType
             wrapper.evalType = self.evalType
    -        wrapper.asNondeterministic = self.asNondeterministic
    +        wrapper.deterministic = self.deterministic
    +        wrapper.asNondeterministic = lambda: self.asNondeterministic()._wrapped()
    --- End diff --
    
    Can we do:
    
    ```python
           wrapper.asNondeterministic = functools.wraps(
               self.asNondeterministic)(lambda: self.asNondeterministic()._wrapped())
    ```
    
    So that it can produce a proper pydoc when we do `help(udf(lambda: 1, "integer").asNondeterministic)` (not `help(udf(lambda: 1, "integer").asNondeterministic())`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Thank you for bearing with me @gatorsmile.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85608/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    With this diff:
    
    ```diff
    diff --git a/python/pyspark/sql/udf.py b/python/pyspark/sql/udf.py
    index 54b5a8656e1..24de9839e90 100644
    --- a/python/pyspark/sql/udf.py
    +++ b/python/pyspark/sql/udf.py
    @@ -162,7 +162,8 @@ class UserDefinedFunction(object):
             wrapper.func = self.func
             wrapper.returnType = self.returnType
             wrapper.evalType = self.evalType
    -        wrapper.asNondeterministic = self.asNondeterministic
    +        wrapper.asNondeterministic = lambda: self.asNondeterministic()._wrapped()
    +        wrapper._unwrapped = lambda: self
    
             return wrapper
    ```
    
    **Before**
    
    ```python
    from pyspark.sql.functions import udf
    help(udf(lambda: 1, "integer").asNondeterministic())
    ```
    
    ```
    Help on UserDefinedFunction in module pyspark.sql.udf object:
    
    class UserDefinedFunction(__builtin__.object)
     |  User defined function in Python
     |
     |  .. versionadded:: 1.3
     |
     |  Methods defined here:
     |
     |  __call__(self, *cols)
     |
     |  __init__(self, func, returnType=StringType, name=None, evalType=100)
     |
     |  asNondeterministic(self)
     |      Updates UserDefinedFunction to nondeterministic.
     |
     |      .. versionadded:: 2.3
     |
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |
     |  __dict__
     |      dictionary for instance variables (if defined)
     |
     |  __weakref__
     |      list of weak references to the object (if defined)
     |
    :
    ```
    
    ```python
    from pyspark.sql.functions import udf
    help(udf(lambda: 1, "integer"))
    ```
    
    ```
    Help on function <lambda> in module __main__:
    
    <lambda> lambda *args
    (END)
    ```
    
    **After**
    
    ```python
    from pyspark.sql.functions import udf
    help(udf(lambda: 1, "integer").asNondeterministic())
    ```
    
    ```
    Help on function <lambda> in module __main__:
    
    <lambda> lambda *args
    (END)
    ```
    
    ```python
    from pyspark.sql.functions import udf
    help(udf(lambda: 1, "integer"))
    ```
    
    ```
    Help on function <lambda> in module __main__:
    
    <lambda> lambda *args
    (END)
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159506888
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    --- End diff --
    
    Let's fix the doc for this too. It says `:param f: python function` but we could describe that it takes Python native function, wrapped function and `UserDefinedFunction` too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Let me test it and be back soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20137


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85607/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    @HyukjinKwon Thank you for your comment!
    
    cc @ueshin @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85616/testReport)** for PR 20137 at commit [`35e6a4a`](https://github.com/apache/spark/commit/35e6a4a5ba2750c4bd4c4bcb3d91f16e6ba1fdea).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159579617
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -227,15 +227,15 @@ def dropGlobalTempView(self, viewName):
         @ignore_unicode_prefix
         @since(2.0)
         def registerFunction(self, name, f, returnType=StringType()):
    -        """Registers a python function (including lambda function) as a UDF
    +        """Registers a Python function (including lambda function) or a wrapped/native UDF
    --- End diff --
    
    Another idea just in case it helps:
    
    ```
    Registers a Python function as a UDF or a user defined function.
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159578328
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -227,15 +227,15 @@ def dropGlobalTempView(self, viewName):
         @ignore_unicode_prefix
         @since(2.0)
         def registerFunction(self, name, f, returnType=StringType()):
    -        """Registers a python function (including lambda function) as a UDF
    +        """Registers a Python function (including lambda function) or a wrapped/native UDF
    --- End diff --
    
    It indeed added some complexity. However, I believe nothing is blocked by #16534 now if I understand correctly.
    
    The changes #16534 is quite nice because IMHO Python guys probably use `help()` and `dir()` more frequently then reading the API doc in the website. For the set of UDFs are provided as a library, I think that's quite worth to keep.
    
    How about leaving this wrapper logic as is for now and then we bring this discussion back when actually something is blocked (or being too complicated) by this?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159443510
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction(
    +        ...     "random_udf", random_udf, StringType())  # doctest: +SKIP
    --- End diff --
    
    why skip the test? we can use a fixed seed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    @HyukjinKwon We need to fix `asNondeterministic `
    ```
        def asNondeterministic(self):
            """
            Updates UserDefinedFunction to nondeterministic.
    
            .. versionadded:: 2.3
            """
            self._deterministic = False
            return self._wrapped()
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159550505
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -172,5 +179,5 @@ def asNondeterministic(self):
     
             .. versionadded:: 2.3
             """
    -        self._deterministic = False
    +        self.deterministic = False
    --- End diff --
    
    `deterministic` is used in `UserDefinedFunction.scala`. Users can use it to check whether this UDF is deterministic or not.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85655/testReport)** for PR 20137 at commit [`09a1b89`](https://github.com/apache/spark/commit/09a1b89cd44349bcf67fd1214827608988787df6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85636/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85608/testReport)** for PR 20137 at commit [`e8d0a4c`](https://github.com/apache/spark/commit/e8d0a4c7c8c9e81fd420195d3cc1a37a3b8459a3).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20137
  
    **[Test build #85657 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85657/testReport)** for PR 20137 at commit [`2482e6b`](https://github.com/apache/spark/commit/2482e6bcdaf92a78ae6b043a859e10140a273a18).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159551571
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -378,6 +378,23 @@ def test_udf2(self):
             [res] = self.spark.sql("SELECT strlen(a) FROM test WHERE strlen(a) > 1").collect()
             self.assertEqual(4, res[0])
     
    +    def test_non_deterministic_udf(self):
    +        import random
    +        from pyspark.sql.functions import udf
    +        random_udf = udf(lambda: random.randint(6, 6), IntegerType()).asNondeterministic()
    +        self.assertEqual(random_udf.deterministic, False)
    +        random_udf1 = self.spark.catalog.registerFunction("randInt", random_udf, StringType())
    +        self.assertEqual(random_udf1.deterministic, False)
    +        [row] = self.spark.sql("SELECT randInt()").collect()
    +        self.assertEqual(row[0], "6")
    +        [row] = self.spark.range(1).select(random_udf1()).collect()
    +        self.assertEqual(row[0], "6")
    +        [row] = self.spark.range(1).select(random_udf()).collect()
    +        self.assertEqual(row[0], 6)
    +        pydoc.render_doc(udf(lambda: random.randint(6, 6), IntegerType()))
    --- End diff --
    
    will add a comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20137#discussion_r159549810
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()):
             >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
             >>> spark.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
    +
    +        >>> import random
    +        >>> from pyspark.sql.functions import udf
    +        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +        >>> newRandom_udf = spark.catalog.registerFunction(
    +        ...     "random_udf", random_udf, StringType())  # doctest: +SKIP
    --- End diff --
    
    `newRandom_udf ` is also used.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org