You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/01/17 04:15:12 UTC

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/20288

    [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs in SQLContext and Catalog in PySpark

    ## What changes were proposed in this pull request?
    
    This PR proposes to deprecate `register*` for UDFs in `SQLContext` and `Catalog` in Spark 2.3.0.
    
    These are inconsistent with Scala / Java APIs and also these basically do the same things with `spark.udf.register*`.
    
    Also, this PR moves the logcis from `[sqlContext|spark.catalog].register*` to `spark.udf.register*` and reuse the docstring.
    
    ## How was this patch tested?
    
    Manually tested, manually checked the API documentation and tests added to check if deprecated APIs call the aliases correctly.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark deprecate-udf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20288.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20288
    
----
commit f63105c7faddc79ccd624c9234b56916efec3569
Author: hyukjinkwon <gu...@...>
Date:   2018-01-17T02:49:08Z

    Deprecate register* for UDFs in SQLContext and Catalog in PySpark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86273/testReport)** for PR 20288 at commit [`08ffa1c`](https://github.com/apache/spark/commit/08ffa1ca2c332205eea370e4d3ce0489eb97424a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86309/testReport)** for PR 20288 at commit [`e121273`](https://github.com/apache/spark/commit/e121273972d0ec0d94cc01e4426358b4e5fb7e2c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162230241
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -172,113 +173,29 @@ def range(self, start, end=None, step=1, numPartitions=None):
             """
             return self.sparkSession.range(start, end, step, numPartitions)
     
    -    @ignore_unicode_prefix
         @since(1.2)
         def registerFunction(self, name, f, returnType=None):
    -        """Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    -        as a UDF. The registered UDF can be used in SQL statements.
    -
    -        :func:`spark.udf.register` is an alias for :func:`sqlContext.registerFunction`.
    -
    -        In addition to a name and the function itself, `returnType` can be optionally specified.
    -        1) When f is a Python function, `returnType` defaults to a string. The produced object must
    -        match the specified type. 2) When f is a :class:`UserDefinedFunction`, Spark uses the return
    -        type of the given UDF as the return type of the registered UDF. The input parameter
    -        `returnType` is None by default. If given by users, the value must be None.
    -
    -        :param name: name of the UDF in SQL statements.
    -        :param f: a Python function, or a wrapped/native UserDefinedFunction. The UDF can be either
    -            row-at-a-time or vectorized.
    -        :param returnType: the return type of the registered UDF.
    -        :return: a wrapped/native :class:`UserDefinedFunction`
    -
    -        >>> strlen = sqlContext.registerFunction("stringLengthString", lambda x: len(x))
    -        >>> sqlContext.sql("SELECT stringLengthString('test')").collect()
    -        [Row(stringLengthString(test)=u'4')]
    -
    -        >>> sqlContext.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    -        [Row(stringLengthString(text)=u'3')]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> from pyspark.sql.functions import udf
    -        >>> slen = udf(lambda s: len(s), IntegerType())
    -        >>> _ = sqlContext.udf.register("slen", slen)
    -        >>> sqlContext.sql("SELECT slen('test')").collect()
    -        [Row(slen(test)=4)]
    -
    -        >>> import random
    -        >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    -        >>> new_random_udf = sqlContext.registerFunction("random_udf", random_udf)
    -        >>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    -        [Row(random_udf()=82)]
    -        >>> sqlContext.range(1).select(new_random_udf()).collect()  # doctest: +SKIP
    -        [Row(<lambda>()=26)]
    -
    -        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    -        >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    -        ... def add_one(x):
    -        ...     return x + 1
    -        ...
    -        >>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: +SKIP
    -        >>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    -        [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    +        """An alias for :func:`spark.udf.register`.
    +        See :meth:`pyspark.sql.UDFRegistration.register`.
    +
    +        .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.register` instead.
    --- End diff --
    
    It shows the doc as below:
    
    ![2018-01-18 10 28 46](https://user-images.githubusercontent.com/6477701/35076515-379756f4-fc3c-11e7-99db-447fb466c626.png)
    
    I checked the link, `pyspark.sql.UDFRegistration.register` is correct.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r161964738
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -778,6 +778,146 @@ def __exit__(self, exc_type, exc_val, exc_tb):
             self.stop()
     
     
    +class UDFRegistration(object):
    --- End diff --
    
    shall we put it in `udf.py`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Just fixed minor doc nits and double checked the built API documentation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r161969687
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -147,7 +147,8 @@ def udf(self):
     
             :return: :class:`UDFRegistration`
             """
    -        return UDFRegistration(self)
    +        from pyspark.sql.session import UDFRegistration
    +        return UDFRegistration(self.sparkSession)
    --- End diff --
    
    How about `return self.sparkSession.udf`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r161966507
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -778,6 +778,146 @@ def __exit__(self, exc_type, exc_val, exc_tb):
             self.stop()
     
     
    +class UDFRegistration(object):
    +    """Wrapper for user-defined function registration."""
    +
    +    def __init__(self, sparkSession):
    +        self.sparkSession = sparkSession
    +
    +    @ignore_unicode_prefix
    +    def register(self, name, f, returnType=None):
    +        """Registers a Python function (including lambda function) or a user-defined function
    +        in SQL statements.
    +
    +        :param name: name of the user-defined function in SQL statements.
    +        :param f: a Python function, or a user-defined function. The user-defined function can
    +            be either row-at-a-time or vectorized. See :meth:`pyspark.sql.functions.udf` and
    +            :meth:`pyspark.sql.functions.pandas_udf`.
    +        :param returnType: the return type of the registered user-defined function.
    +        :return: a user-defined function.
    +
    +        `returnType` can be optionally specified when `f` is a Python function but not
    +        when `f` is a user-defined function. See below:
    +
    +        1. When `f` is a Python function, `returnType` defaults to string type and can be
    +        optionally specified. The produced object must match the specified type. In this case,
    +        this API works as if `register(name, f, returnType=StringType())`.
    +
    +            >>> strlen = spark.udf.register("stringLengthString", lambda x: len(x))
    +            >>> spark.sql("SELECT stringLengthString('test')").collect()
    +            [Row(stringLengthString(test)=u'4')]
    +
    +            >>> spark.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    +            [Row(stringLengthString(text)=u'3')]
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> _ = spark.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    +            >>> spark.sql("SELECT stringLengthInt('test')").collect()
    +            [Row(stringLengthInt(test)=4)]
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> _ = spark.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    +            >>> spark.sql("SELECT stringLengthInt('test')").collect()
    +            [Row(stringLengthInt(test)=4)]
    +
    +        2. When `f` is a user-defined function, Spark uses the return type of the given a
    --- End diff --
    
    of the given a user-defined function -> of the given user-defined function



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86273/testReport)** for PR 20288 at commit [`08ffa1c`](https://github.com/apache/spark/commit/08ffa1ca2c332205eea370e4d3ce0489eb97424a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r161966469
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -778,6 +778,146 @@ def __exit__(self, exc_type, exc_val, exc_tb):
             self.stop()
     
     
    +class UDFRegistration(object):
    +    """Wrapper for user-defined function registration."""
    +
    +    def __init__(self, sparkSession):
    +        self.sparkSession = sparkSession
    +
    +    @ignore_unicode_prefix
    +    def register(self, name, f, returnType=None):
    +        """Registers a Python function (including lambda function) or a user-defined function
    +        in SQL statements.
    +
    +        :param name: name of the user-defined function in SQL statements.
    +        :param f: a Python function, or a user-defined function. The user-defined function can
    +            be either row-at-a-time or vectorized. See :meth:`pyspark.sql.functions.udf` and
    +            :meth:`pyspark.sql.functions.pandas_udf`.
    +        :param returnType: the return type of the registered user-defined function.
    +        :return: a user-defined function.
    +
    +        `returnType` can be optionally specified when `f` is a Python function but not
    +        when `f` is a user-defined function. See below:
    +
    +        1. When `f` is a Python function, `returnType` defaults to string type and can be
    +        optionally specified. The produced object must match the specified type. In this case,
    +        this API works as if `register(name, f, returnType=StringType())`.
    +
    +            >>> strlen = spark.udf.register("stringLengthString", lambda x: len(x))
    +            >>> spark.sql("SELECT stringLengthString('test')").collect()
    +            [Row(stringLengthString(test)=u'4')]
    +
    +            >>> spark.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    +            [Row(stringLengthString(text)=u'3')]
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> _ = spark.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    +            >>> spark.sql("SELECT stringLengthInt('test')").collect()
    +            [Row(stringLengthInt(test)=4)]
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> _ = spark.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    +            >>> spark.sql("SELECT stringLengthInt('test')").collect()
    +            [Row(stringLengthInt(test)=4)]
    +
    +        2. When `f` is a user-defined function, Spark uses the return type of the given a
    +        user-defined function as the return type of the registered a user-defined function.
    --- End diff --
    
    the registered a user-defined function -> the registered user-defined function


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Will try to double check and clean up soon. Comments so far above look all valid.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162229716
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -29,9 +29,10 @@
     from pyspark.sql.readwriter import DataFrameReader
     from pyspark.sql.streaming import DataStreamReader
     from pyspark.sql.types import IntegerType, Row, StringType
    +from pyspark.sql.udf import UDFRegistration
    --- End diff --
    
    I intendedly kept this to retain the import path `pyspark.sql.context.UDFRegistration` just in case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by icexelloss <gi...@git.apache.org>.

Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86264/testReport)** for PR 20288 at commit [`6b9b9c4`](https://github.com/apache/spark/commit/6b9b9c44ea7cafa7e1fb607bcf5a2d19336f31f4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class UDFRegistration(object):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86234/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86306/testReport)** for PR 20288 at commit [`3e0147b`](https://github.com/apache/spark/commit/3e0147bd11b980d91a2b628b85c5d6a05391b28e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by icexelloss <gi...@git.apache.org>.

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162089110
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -224,92 +224,20 @@ def dropGlobalTempView(self, viewName):
             """
             self._jcatalog.dropGlobalTempView(viewName)
     
    -    @ignore_unicode_prefix
    -    @since(2.0)
         def registerFunction(self, name, f, returnType=None):
    -        """Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    -        as a UDF. The registered UDF can be used in SQL statements.
    -
    -        :func:`spark.udf.register` is an alias for :func:`spark.catalog.registerFunction`.
    -
    -        In addition to a name and the function itself, `returnType` can be optionally specified.
    -        1) When f is a Python function, `returnType` defaults to a string. The produced object must
    -        match the specified type. 2) When f is a :class:`UserDefinedFunction`, Spark uses the return
    -        type of the given UDF as the return type of the registered UDF. The input parameter
    -        `returnType` is None by default. If given by users, the value must be None.
    -
    -        :param name: name of the UDF in SQL statements.
    -        :param f: a Python function, or a wrapped/native UserDefinedFunction. The UDF can be either
    -            row-at-a-time or vectorized.
    -        :param returnType: the return type of the registered UDF.
    -        :return: a wrapped/native :class:`UserDefinedFunction`
    -
    -        >>> strlen = spark.catalog.registerFunction("stringLengthString", len)
    -        >>> spark.sql("SELECT stringLengthString('test')").collect()
    -        [Row(stringLengthString(test)=u'4')]
    -
    -        >>> spark.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    -        [Row(stringLengthString(text)=u'3')]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = spark.catalog.registerFunction("stringLengthInt", len, IntegerType())
    -        >>> spark.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
    -        >>> spark.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> from pyspark.sql.functions import udf
    -        >>> slen = udf(lambda s: len(s), IntegerType())
    -        >>> _ = spark.udf.register("slen", slen)
    -        >>> spark.sql("SELECT slen('test')").collect()
    -        [Row(slen(test)=4)]
    -
    -        >>> import random
    -        >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    -        >>> new_random_udf = spark.catalog.registerFunction("random_udf", random_udf)
    -        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    -        [Row(random_udf()=82)]
    -        >>> spark.range(1).select(new_random_udf()).collect()  # doctest: +SKIP
    -        [Row(<lambda>()=26)]
    -
    -        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    -        >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    -        ... def add_one(x):
    -        ...     return x + 1
    -        ...
    -        >>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
    -        >>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    -        [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    -        """
    -
    -        # This is to check whether the input function is a wrapped/native UserDefinedFunction
    -        if hasattr(f, 'asNondeterministic'):
    -            if returnType is not None:
    -                raise TypeError(
    -                    "Invalid returnType: None is expected when f is a UserDefinedFunction, "
    -                    "but got %s." % returnType)
    -            if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
    -                                  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
    -                raise ValueError(
    -                    "Invalid f: f must be either SQL_BATCHED_UDF or SQL_PANDAS_SCALAR_UDF")
    -            register_udf = UserDefinedFunction(f.func, returnType=f.returnType, name=name,
    -                                               evalType=f.evalType,
    -                                               deterministic=f.deterministic)
    -            return_udf = f
    -        else:
    -            if returnType is None:
    -                returnType = StringType()
    -            register_udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                               evalType=PythonEvalType.SQL_BATCHED_UDF)
    -            return_udf = register_udf._wrapped()
    -        self._jsparkSession.udf().registerPython(name, register_udf._judf)
    -        return return_udf
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.register instead.",
    +            DeprecationWarning)
    +        return self._sparkSession.udf.register(name, f, returnType)
    +    # Reuse the docstring from UDFRegistration but with few notes.
    +    _register_doc = UDFRegistration.register.__doc__.strip()
    --- End diff --
    
    An alternative is to do sth like:
    
    ```
    An alias for :func:`spark.udf.register`
    
    .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.register` instead.
    .. versionadded:: 2.0
    ``` 
    So we don't need to copy the docstring.
    
    But I am fine either way. @HyukjinKwon you can decide.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162035869
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -181,3 +183,179 @@ def asNondeterministic(self):
             """
             self.deterministic = False
             return self
    +
    +
    +class UDFRegistration(object):
    +    """
    +    Wrapper for user-defined function registration.
    +
    +    .. versionadded:: 1.3.1
    +    """
    +
    +    def __init__(self, sparkSession):
    +        self.sparkSession = sparkSession
    +
    +    @ignore_unicode_prefix
    +    @since(1.3)
    +    def register(self, name, f, returnType=None):
    +        """Registers a Python function (including lambda function) or a user-defined function
    +        in SQL statements.
    +
    +        :param name: name of the user-defined function in SQL statements.
    +        :param f: a Python function, or a user-defined function. The user-defined function can
    +            be either row-at-a-time or vectorized. See :meth:`pyspark.sql.functions.udf` and
    +            :meth:`pyspark.sql.functions.pandas_udf`.
    +        :param returnType: the return type of the registered user-defined function.
    +        :return: a user-defined function.
    +
    +        `returnType` can be optionally specified when `f` is a Python function but not
    +        when `f` is a user-defined function. Please see below.
    +
    +        1. When `f` is a Python function:
    +
    +            `returnType` defaults to string type and can be optionally specified. The produced
    +            object must match the specified type. In this case, this API works as if
    +            `register(name, f, returnType=StringType())`.
    +
    +            >>> strlen = spark.udf.register("stringLengthString", lambda x: len(x))
    +            >>> spark.sql("SELECT stringLengthString('test')").collect()
    +            [Row(stringLengthString(test)=u'4')]
    +
    +            >>> spark.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    +            [Row(stringLengthString(text)=u'3')]
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> _ = spark.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    +            >>> spark.sql("SELECT stringLengthInt('test')").collect()
    +            [Row(stringLengthInt(test)=4)]
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> _ = spark.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    +            >>> spark.sql("SELECT stringLengthInt('test')").collect()
    +            [Row(stringLengthInt(test)=4)]
    +
    +        2. When `f` is a user-defined function:
    +
    +            Spark uses the return type of the given user-defined function as the return type of
    +            the registered user-defined function. `returnType` should not be specified.
    +            In this case, this API works as if `register(name, f)`.
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> from pyspark.sql.functions import udf
    +            >>> slen = udf(lambda s: len(s), IntegerType())
    +            >>> _ = spark.udf.register("slen", slen)
    +            >>> spark.sql("SELECT slen('test')").collect()
    +            [Row(slen(test)=4)]
    +
    +            >>> import random
    +            >>> from pyspark.sql.functions import udf
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +            >>> new_random_udf = spark.udf.register("random_udf", random_udf)
    +            >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    +            [Row(random_udf()=82)]
    +
    +            >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    +            >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    +            ... def add_one(x):
    +            ...     return x + 1
    +            ...
    +            >>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
    +            >>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    +            [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    +
    +            .. note:: Registration for a user-defined function (case 2.) was added from
    +                Spark 2.3.0.
    +        """
    +
    +        # This is to check whether the input function is from a user-defined function or
    +        # Python function.
    +        if hasattr(f, 'asNondeterministic'):
    +            if returnType is not None:
    +                raise TypeError(
    +                    "Invalid returnType: data type can not be specified when f is"
    +                    "a user-defined function, but got %s." % returnType)
    +            if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
    +                                  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
    +                raise ValueError(
    +                    "Invalid f: f must be either SQL_BATCHED_UDF or SQL_PANDAS_SCALAR_UDF")
    +            register_udf = UserDefinedFunction(f.func, returnType=f.returnType, name=name,
    +                                               evalType=f.evalType,
    +                                               deterministic=f.deterministic)
    +            return_udf = f
    +        else:
    +            if returnType is None:
    +                returnType = StringType()
    +            register_udf = UserDefinedFunction(f, returnType=returnType, name=name,
    +                                               evalType=PythonEvalType.SQL_BATCHED_UDF)
    +            return_udf = register_udf._wrapped()
    +        self.sparkSession._jsparkSession.udf().registerPython(name, register_udf._judf)
    +        return return_udf
    +
    +    @ignore_unicode_prefix
    +    @since(2.3)
    +    def registerJavaFunction(self, name, javaClassName, returnType=None):
    +        """Register a Java user-defined function so it can be used in SQL statements.
    +
    +        In addition to a name and the function itself, the return type can be optionally specified.
    +        When the return type is not specified we would infer it via reflection.
    +
    +        :param name:  name of the user-defined function
    +        :param javaClassName: fully qualified name of java class
    +        :param returnType: a :class:`pyspark.sql.types.DataType` object
    +
    +        >>> from pyspark.sql.types import IntegerType
    +        >>> spark.udf.registerJavaFunction("javaStringLength",
    +        ...   "test.org.apache.spark.sql.JavaStringLength", IntegerType())
    +        >>> spark.sql("SELECT javaStringLength('test')").collect()
    +        [Row(UDF:javaStringLength(test)=4)]
    +        >>> spark.udf.registerJavaFunction("javaStringLength2",
    +        ...   "test.org.apache.spark.sql.JavaStringLength")
    +        >>> spark.sql("SELECT javaStringLength2('test')").collect()
    +        [Row(UDF:javaStringLength2(test)=4)]
    +        """
    --- End diff --
    
    <img width="560" alt="2018-01-17 9 23 28" src="https://user-images.githubusercontent.com/6477701/35042749-26db2116-fbcd-11e7-840b-635d019c6ccf.png">



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86309/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86247/testReport)** for PR 20288 at commit [`08438ee`](https://github.com/apache/spark/commit/08438ee7d8c209a2dcb3eb4efeeef77451feb8d7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86274/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r161965369
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -778,6 +778,146 @@ def __exit__(self, exc_type, exc_val, exc_tb):
             self.stop()
     
     
    +class UDFRegistration(object):
    +    """Wrapper for user-defined function registration."""
    +
    +    def __init__(self, sparkSession):
    +        self.sparkSession = sparkSession
    +
    +    @ignore_unicode_prefix
    +    def register(self, name, f, returnType=None):
    --- End diff --
    
    shall we add `since 2.3`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86234/testReport)** for PR 20288 at commit [`f63105c`](https://github.com/apache/spark/commit/f63105c7faddc79ccd624c9234b56916efec3569).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86308/testReport)** for PR 20288 at commit [`c9512a6`](https://github.com/apache/spark/commit/c9512a66800709417425c0d348c9327ed681420d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86265/testReport)** for PR 20288 at commit [`f1fe40a`](https://github.com/apache/spark/commit/f1fe40a5afe876cf3b81208af7bc1cd379bcb732).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162198033
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -224,92 +224,20 @@ def dropGlobalTempView(self, viewName):
             """
             self._jcatalog.dropGlobalTempView(viewName)
     
    -    @ignore_unicode_prefix
    -    @since(2.0)
         def registerFunction(self, name, f, returnType=None):
    -        """Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    -        as a UDF. The registered UDF can be used in SQL statements.
    -
    -        :func:`spark.udf.register` is an alias for :func:`spark.catalog.registerFunction`.
    -
    -        In addition to a name and the function itself, `returnType` can be optionally specified.
    -        1) When f is a Python function, `returnType` defaults to a string. The produced object must
    -        match the specified type. 2) When f is a :class:`UserDefinedFunction`, Spark uses the return
    -        type of the given UDF as the return type of the registered UDF. The input parameter
    -        `returnType` is None by default. If given by users, the value must be None.
    -
    -        :param name: name of the UDF in SQL statements.
    -        :param f: a Python function, or a wrapped/native UserDefinedFunction. The UDF can be either
    -            row-at-a-time or vectorized.
    -        :param returnType: the return type of the registered UDF.
    -        :return: a wrapped/native :class:`UserDefinedFunction`
    -
    -        >>> strlen = spark.catalog.registerFunction("stringLengthString", len)
    -        >>> spark.sql("SELECT stringLengthString('test')").collect()
    -        [Row(stringLengthString(test)=u'4')]
    -
    -        >>> spark.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    -        [Row(stringLengthString(text)=u'3')]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = spark.catalog.registerFunction("stringLengthInt", len, IntegerType())
    -        >>> spark.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
    -        >>> spark.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> from pyspark.sql.functions import udf
    -        >>> slen = udf(lambda s: len(s), IntegerType())
    -        >>> _ = spark.udf.register("slen", slen)
    -        >>> spark.sql("SELECT slen('test')").collect()
    -        [Row(slen(test)=4)]
    -
    -        >>> import random
    -        >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    -        >>> new_random_udf = spark.catalog.registerFunction("random_udf", random_udf)
    -        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    -        [Row(random_udf()=82)]
    -        >>> spark.range(1).select(new_random_udf()).collect()  # doctest: +SKIP
    -        [Row(<lambda>()=26)]
    -
    -        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    -        >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    -        ... def add_one(x):
    -        ...     return x + 1
    -        ...
    -        >>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
    -        >>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    -        [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    -        """
    -
    -        # This is to check whether the input function is a wrapped/native UserDefinedFunction
    -        if hasattr(f, 'asNondeterministic'):
    -            if returnType is not None:
    -                raise TypeError(
    -                    "Invalid returnType: None is expected when f is a UserDefinedFunction, "
    -                    "but got %s." % returnType)
    -            if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
    -                                  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
    -                raise ValueError(
    -                    "Invalid f: f must be either SQL_BATCHED_UDF or SQL_PANDAS_SCALAR_UDF")
    -            register_udf = UserDefinedFunction(f.func, returnType=f.returnType, name=name,
    -                                               evalType=f.evalType,
    -                                               deterministic=f.deterministic)
    -            return_udf = f
    -        else:
    -            if returnType is None:
    -                returnType = StringType()
    -            register_udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                               evalType=PythonEvalType.SQL_BATCHED_UDF)
    -            return_udf = register_udf._wrapped()
    -        self._jsparkSession.udf().registerPython(name, register_udf._judf)
    -        return return_udf
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.register instead.",
    +            DeprecationWarning)
    +        return self._sparkSession.udf.register(name, f, returnType)
    +    # Reuse the docstring from UDFRegistration but with few notes.
    +    _register_doc = UDFRegistration.register.__doc__.strip()
    --- End diff --
    
    Oh, wait .. I think this is another good alternative. Will double check and be back today.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162031475
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -181,3 +183,180 @@ def asNondeterministic(self):
             """
             self.deterministic = False
             return self
    +
    +
    +class UDFRegistration(object):
    --- End diff --
    
    This seems introduced from 1.3.1 - https://issues.apache.org/jira/browse/SPARK-6603


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162031848
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -172,113 +173,34 @@ def range(self, start, end=None, step=1, numPartitions=None):
             """
             return self.sparkSession.range(start, end, step, numPartitions)
     
    -    @ignore_unicode_prefix
    -    @since(1.2)
         def registerFunction(self, name, f, returnType=None):
    -        """Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    -        as a UDF. The registered UDF can be used in SQL statements.
    -
    -        :func:`spark.udf.register` is an alias for :func:`sqlContext.registerFunction`.
    -
    -        In addition to a name and the function itself, `returnType` can be optionally specified.
    -        1) When f is a Python function, `returnType` defaults to a string. The produced object must
    -        match the specified type. 2) When f is a :class:`UserDefinedFunction`, Spark uses the return
    -        type of the given UDF as the return type of the registered UDF. The input parameter
    -        `returnType` is None by default. If given by users, the value must be None.
    -
    -        :param name: name of the UDF in SQL statements.
    -        :param f: a Python function, or a wrapped/native UserDefinedFunction. The UDF can be either
    -            row-at-a-time or vectorized.
    -        :param returnType: the return type of the registered UDF.
    -        :return: a wrapped/native :class:`UserDefinedFunction`
    -
    -        >>> strlen = sqlContext.registerFunction("stringLengthString", lambda x: len(x))
    -        >>> sqlContext.sql("SELECT stringLengthString('test')").collect()
    -        [Row(stringLengthString(test)=u'4')]
    -
    -        >>> sqlContext.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    -        [Row(stringLengthString(text)=u'3')]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> from pyspark.sql.functions import udf
    -        >>> slen = udf(lambda s: len(s), IntegerType())
    -        >>> _ = sqlContext.udf.register("slen", slen)
    -        >>> sqlContext.sql("SELECT slen('test')").collect()
    -        [Row(slen(test)=4)]
    -
    -        >>> import random
    -        >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    -        >>> new_random_udf = sqlContext.registerFunction("random_udf", random_udf)
    -        >>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    -        [Row(random_udf()=82)]
    -        >>> sqlContext.range(1).select(new_random_udf()).collect()  # doctest: +SKIP
    -        [Row(<lambda>()=26)]
    -
    -        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    -        >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    -        ... def add_one(x):
    -        ...     return x + 1
    -        ...
    -        >>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: +SKIP
    -        >>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    -        [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    -        """
    -        return self.sparkSession.catalog.registerFunction(name, f, returnType)
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.register instead.",
    +            DeprecationWarning)
    +        return self.sparkSession.udf.register(name, f, returnType)
    +    # Reuse the docstring from UDFRegistration but with few notes.
    +    _register_doc = UDFRegistration.register.__doc__.strip()
    +    registerFunction.__doc__ = """%s
     
    -    @ignore_unicode_prefix
    -    @since(2.1)
    -    def registerJavaFunction(self, name, javaClassName, returnType=None):
    -        """Register a java UDF so it can be used in SQL statements.
    -
    -        In addition to a name and the function itself, the return type can be optionally specified.
    -        When the return type is not specified we would infer it via reflection.
    -        :param name:  name of the UDF
    -        :param javaClassName: fully qualified name of java class
    -        :param returnType: a :class:`pyspark.sql.types.DataType` object
    -
    -        >>> sqlContext.registerJavaFunction("javaStringLength",
    -        ...   "test.org.apache.spark.sql.JavaStringLength", IntegerType())
    -        >>> sqlContext.sql("SELECT javaStringLength('test')").collect()
    -        [Row(UDF:javaStringLength(test)=4)]
    -        >>> sqlContext.registerJavaFunction("javaStringLength2",
    -        ...   "test.org.apache.spark.sql.JavaStringLength")
    -        >>> sqlContext.sql("SELECT javaStringLength2('test')").collect()
    -        [Row(UDF:javaStringLength2(test)=4)]
    +        .. note:: :func:`sqlContext.registerFunction` is an alias for
    +            :func:`spark.udf.register`.
    +        .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.register` instead.
    +        .. versionadded:: 1.2
    +    """ % _register_doc[:_register_doc.rfind('versionadded::')]
     
    -        """
    -        jdt = None
    -        if returnType is not None:
    -            jdt = self.sparkSession._jsparkSession.parseDataType(returnType.json())
    -        self.sparkSession._jsparkSession.udf().registerJava(name, javaClassName, jdt)
    +    def registerJavaFunction(self, name, javaClassName, returnType=None):
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.registerJavaFunction instead.",
    +            DeprecationWarning)
    +        return self.sparkSession.udf.registerJavaFunction(name, javaClassName, returnType)
    +    _registerJavaFunction_doc = UDFRegistration.registerJavaFunction.__doc__.strip()
    +    registerJavaFunction.__doc__ = """%s
     
    -    @ignore_unicode_prefix
    -    @since(2.3)
    -    def registerJavaUDAF(self, name, javaClassName):
    -        """Register a java UDAF so it can be used in SQL statements.
    -
    -        :param name:  name of the UDAF
    -        :param javaClassName: fully qualified name of java class
    -
    -        >>> sqlContext.registerJavaUDAF("javaUDAF",
    -        ...   "test.org.apache.spark.sql.MyDoubleAvg")
    -        >>> df = sqlContext.createDataFrame([(1, "a"),(2, "b"), (3, "a")],["id", "name"])
    -        >>> df.registerTempTable("df")
    -        >>> sqlContext.sql("SELECT name, javaUDAF(id) as avg from df group by name").collect()
    -        [Row(name=u'b', avg=102.0), Row(name=u'a', avg=102.0)]
    -        """
    -        self.sparkSession._jsparkSession.udf().registerJavaUDAF(name, javaClassName)
    +        .. note:: :func:`sqlContext.registerJavaFunction` is an alias for
    +            :func:`spark.udf.registerJavaFunction`
    +        .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.registerJavaFunction` instead.
    +        .. versionadded:: 2.1
    +    """ % _registerJavaFunction_doc[:_registerJavaFunction_doc.rfind('versionadded::')]
    --- End diff --
    
    We are fine to use this `rfind` way because `since` decorator adds `versionadded::` at the end.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86309/testReport)** for PR 20288 at commit [`e121273`](https://github.com/apache/spark/commit/e121273972d0ec0d94cc01e4426358b4e5fb7e2c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86273/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r161983722
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -224,92 +225,18 @@ def dropGlobalTempView(self, viewName):
             """
             self._jcatalog.dropGlobalTempView(viewName)
     
    -    @ignore_unicode_prefix
    -    @since(2.0)
         def registerFunction(self, name, f, returnType=None):
    -        """Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    -        as a UDF. The registered UDF can be used in SQL statements.
    -
    -        :func:`spark.udf.register` is an alias for :func:`spark.catalog.registerFunction`.
    -
    -        In addition to a name and the function itself, `returnType` can be optionally specified.
    -        1) When f is a Python function, `returnType` defaults to a string. The produced object must
    -        match the specified type. 2) When f is a :class:`UserDefinedFunction`, Spark uses the return
    -        type of the given UDF as the return type of the registered UDF. The input parameter
    -        `returnType` is None by default. If given by users, the value must be None.
    -
    -        :param name: name of the UDF in SQL statements.
    -        :param f: a Python function, or a wrapped/native UserDefinedFunction. The UDF can be either
    -            row-at-a-time or vectorized.
    -        :param returnType: the return type of the registered UDF.
    -        :return: a wrapped/native :class:`UserDefinedFunction`
    -
    -        >>> strlen = spark.catalog.registerFunction("stringLengthString", len)
    -        >>> spark.sql("SELECT stringLengthString('test')").collect()
    -        [Row(stringLengthString(test)=u'4')]
    -
    -        >>> spark.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    -        [Row(stringLengthString(text)=u'3')]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = spark.catalog.registerFunction("stringLengthInt", len, IntegerType())
    -        >>> spark.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
    -        >>> spark.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> from pyspark.sql.functions import udf
    -        >>> slen = udf(lambda s: len(s), IntegerType())
    -        >>> _ = spark.udf.register("slen", slen)
    -        >>> spark.sql("SELECT slen('test')").collect()
    -        [Row(slen(test)=4)]
    -
    -        >>> import random
    -        >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    -        >>> new_random_udf = spark.catalog.registerFunction("random_udf", random_udf)
    -        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    -        [Row(random_udf()=82)]
    -        >>> spark.range(1).select(new_random_udf()).collect()  # doctest: +SKIP
    -        [Row(<lambda>()=26)]
    -
    -        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    -        >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    -        ... def add_one(x):
    -        ...     return x + 1
    -        ...
    -        >>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
    -        >>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    -        [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    -        """
    -
    -        # This is to check whether the input function is a wrapped/native UserDefinedFunction
    -        if hasattr(f, 'asNondeterministic'):
    -            if returnType is not None:
    -                raise TypeError(
    -                    "Invalid returnType: None is expected when f is a UserDefinedFunction, "
    -                    "but got %s." % returnType)
    -            if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
    -                                  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
    -                raise ValueError(
    -                    "Invalid f: f must be either SQL_BATCHED_UDF or SQL_PANDAS_SCALAR_UDF")
    -            register_udf = UserDefinedFunction(f.func, returnType=f.returnType, name=name,
    -                                               evalType=f.evalType,
    -                                               deterministic=f.deterministic)
    -            return_udf = f
    -        else:
    -            if returnType is None:
    -                returnType = StringType()
    -            register_udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                               evalType=PythonEvalType.SQL_BATCHED_UDF)
    -            return_udf = register_udf._wrapped()
    -        self._jsparkSession.udf().registerPython(name, register_udf._judf)
    -        return return_udf
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.register instead.",
    +            DeprecationWarning)
    +        return self._sparkSession.udf.register(name, f, returnType)
    +    # Reuse the docstring from UDFRegistration but with few notes.
    +    registerFunction.__doc__ = """%s
    +        .. note:: :func:`spark.catalog.registerFunction` is an alias
    +            for :func:`spark.udf.register`.
    +        .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.register` instead.
    --- End diff --
    
    I think we should remove this out WDYT @cloud-fan?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162035576
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -172,113 +173,34 @@ def range(self, start, end=None, step=1, numPartitions=None):
             """
             return self.sparkSession.range(start, end, step, numPartitions)
     
    -    @ignore_unicode_prefix
    -    @since(1.2)
         def registerFunction(self, name, f, returnType=None):
    -        """Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    -        as a UDF. The registered UDF can be used in SQL statements.
    -
    -        :func:`spark.udf.register` is an alias for :func:`sqlContext.registerFunction`.
    -
    -        In addition to a name and the function itself, `returnType` can be optionally specified.
    -        1) When f is a Python function, `returnType` defaults to a string. The produced object must
    -        match the specified type. 2) When f is a :class:`UserDefinedFunction`, Spark uses the return
    -        type of the given UDF as the return type of the registered UDF. The input parameter
    -        `returnType` is None by default. If given by users, the value must be None.
    -
    -        :param name: name of the UDF in SQL statements.
    -        :param f: a Python function, or a wrapped/native UserDefinedFunction. The UDF can be either
    -            row-at-a-time or vectorized.
    -        :param returnType: the return type of the registered UDF.
    -        :return: a wrapped/native :class:`UserDefinedFunction`
    -
    -        >>> strlen = sqlContext.registerFunction("stringLengthString", lambda x: len(x))
    -        >>> sqlContext.sql("SELECT stringLengthString('test')").collect()
    -        [Row(stringLengthString(test)=u'4')]
    -
    -        >>> sqlContext.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    -        [Row(stringLengthString(text)=u'3')]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> from pyspark.sql.functions import udf
    -        >>> slen = udf(lambda s: len(s), IntegerType())
    -        >>> _ = sqlContext.udf.register("slen", slen)
    -        >>> sqlContext.sql("SELECT slen('test')").collect()
    -        [Row(slen(test)=4)]
    -
    -        >>> import random
    -        >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    -        >>> new_random_udf = sqlContext.registerFunction("random_udf", random_udf)
    -        >>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    -        [Row(random_udf()=82)]
    -        >>> sqlContext.range(1).select(new_random_udf()).collect()  # doctest: +SKIP
    -        [Row(<lambda>()=26)]
    -
    -        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    -        >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    -        ... def add_one(x):
    -        ...     return x + 1
    -        ...
    -        >>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: +SKIP
    -        >>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    -        [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    -        """
    -        return self.sparkSession.catalog.registerFunction(name, f, returnType)
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.register instead.",
    +            DeprecationWarning)
    +        return self.sparkSession.udf.register(name, f, returnType)
    +    # Reuse the docstring from UDFRegistration but with few notes.
    +    _register_doc = UDFRegistration.register.__doc__.strip()
    +    registerFunction.__doc__ = """%s
     
    -    @ignore_unicode_prefix
    -    @since(2.1)
    -    def registerJavaFunction(self, name, javaClassName, returnType=None):
    -        """Register a java UDF so it can be used in SQL statements.
    -
    -        In addition to a name and the function itself, the return type can be optionally specified.
    -        When the return type is not specified we would infer it via reflection.
    -        :param name:  name of the UDF
    -        :param javaClassName: fully qualified name of java class
    -        :param returnType: a :class:`pyspark.sql.types.DataType` object
    -
    -        >>> sqlContext.registerJavaFunction("javaStringLength",
    -        ...   "test.org.apache.spark.sql.JavaStringLength", IntegerType())
    -        >>> sqlContext.sql("SELECT javaStringLength('test')").collect()
    -        [Row(UDF:javaStringLength(test)=4)]
    -        >>> sqlContext.registerJavaFunction("javaStringLength2",
    -        ...   "test.org.apache.spark.sql.JavaStringLength")
    -        >>> sqlContext.sql("SELECT javaStringLength2('test')").collect()
    -        [Row(UDF:javaStringLength2(test)=4)]
    +        .. note:: :func:`sqlContext.registerFunction` is an alias for
    +            :func:`spark.udf.register`.
    +        .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.register` instead.
    +        .. versionadded:: 1.2
    +    """ % _register_doc[:_register_doc.rfind('versionadded::')]
     
    -        """
    -        jdt = None
    -        if returnType is not None:
    -            jdt = self.sparkSession._jsparkSession.parseDataType(returnType.json())
    -        self.sparkSession._jsparkSession.udf().registerJava(name, javaClassName, jdt)
    +    def registerJavaFunction(self, name, javaClassName, returnType=None):
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.registerJavaFunction instead.",
    +            DeprecationWarning)
    +        return self.sparkSession.udf.registerJavaFunction(name, javaClassName, returnType)
    +    _registerJavaFunction_doc = UDFRegistration.registerJavaFunction.__doc__.strip()
    +    registerJavaFunction.__doc__ = """%s
     
    -    @ignore_unicode_prefix
    -    @since(2.3)
    -    def registerJavaUDAF(self, name, javaClassName):
    -        """Register a java UDAF so it can be used in SQL statements.
    -
    -        :param name:  name of the UDAF
    -        :param javaClassName: fully qualified name of java class
    -
    -        >>> sqlContext.registerJavaUDAF("javaUDAF",
    -        ...   "test.org.apache.spark.sql.MyDoubleAvg")
    -        >>> df = sqlContext.createDataFrame([(1, "a"),(2, "b"), (3, "a")],["id", "name"])
    -        >>> df.registerTempTable("df")
    -        >>> sqlContext.sql("SELECT name, javaUDAF(id) as avg from df group by name").collect()
    -        [Row(name=u'b', avg=102.0), Row(name=u'a', avg=102.0)]
    -        """
    -        self.sparkSession._jsparkSession.udf().registerJavaUDAF(name, javaClassName)
    +        .. note:: :func:`sqlContext.registerJavaFunction` is an alias for
    +            :func:`spark.udf.registerJavaFunction`
    +        .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.registerJavaFunction` instead.
    +        .. versionadded:: 2.1
    +    """ % _registerJavaFunction_doc[:_registerJavaFunction_doc.rfind('versionadded::')]
    --- End diff --
    
    <img width="699" alt="2018-01-17 9 22 57" src="https://user-images.githubusercontent.com/6477701/35042699-076059e6-fbcd-11e7-8737-50c45d681f33.png">



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    LGTM except one minor comment. 
    
    Could you submit a follow-up PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Thanks! merging to master/2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    @gatorsmile, I am sorry i don't know why I missed this comment ..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162035758
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -181,3 +183,179 @@ def asNondeterministic(self):
             """
             self.deterministic = False
             return self
    +
    +
    +class UDFRegistration(object):
    +    """
    +    Wrapper for user-defined function registration.
    +
    +    .. versionadded:: 1.3.1
    +    """
    +
    +    def __init__(self, sparkSession):
    +        self.sparkSession = sparkSession
    +
    +    @ignore_unicode_prefix
    +    @since(1.3)
    +    def register(self, name, f, returnType=None):
    +        """Registers a Python function (including lambda function) or a user-defined function
    +        in SQL statements.
    +
    +        :param name: name of the user-defined function in SQL statements.
    +        :param f: a Python function, or a user-defined function. The user-defined function can
    +            be either row-at-a-time or vectorized. See :meth:`pyspark.sql.functions.udf` and
    +            :meth:`pyspark.sql.functions.pandas_udf`.
    +        :param returnType: the return type of the registered user-defined function.
    +        :return: a user-defined function.
    +
    +        `returnType` can be optionally specified when `f` is a Python function but not
    +        when `f` is a user-defined function. Please see below.
    +
    +        1. When `f` is a Python function:
    +
    +            `returnType` defaults to string type and can be optionally specified. The produced
    +            object must match the specified type. In this case, this API works as if
    +            `register(name, f, returnType=StringType())`.
    +
    +            >>> strlen = spark.udf.register("stringLengthString", lambda x: len(x))
    +            >>> spark.sql("SELECT stringLengthString('test')").collect()
    +            [Row(stringLengthString(test)=u'4')]
    +
    +            >>> spark.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    +            [Row(stringLengthString(text)=u'3')]
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> _ = spark.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    +            >>> spark.sql("SELECT stringLengthInt('test')").collect()
    +            [Row(stringLengthInt(test)=4)]
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> _ = spark.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    +            >>> spark.sql("SELECT stringLengthInt('test')").collect()
    +            [Row(stringLengthInt(test)=4)]
    +
    +        2. When `f` is a user-defined function:
    +
    +            Spark uses the return type of the given user-defined function as the return type of
    +            the registered user-defined function. `returnType` should not be specified.
    +            In this case, this API works as if `register(name, f)`.
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> from pyspark.sql.functions import udf
    +            >>> slen = udf(lambda s: len(s), IntegerType())
    +            >>> _ = spark.udf.register("slen", slen)
    +            >>> spark.sql("SELECT slen('test')").collect()
    +            [Row(slen(test)=4)]
    +
    +            >>> import random
    +            >>> from pyspark.sql.functions import udf
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +            >>> new_random_udf = spark.udf.register("random_udf", random_udf)
    +            >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    +            [Row(random_udf()=82)]
    +
    +            >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    +            >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    +            ... def add_one(x):
    +            ...     return x + 1
    +            ...
    +            >>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
    +            >>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    +            [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    +
    +            .. note:: Registration for a user-defined function (case 2.) was added from
    +                Spark 2.3.0.
    +        """
    --- End diff --
    
    <img width="715" alt="2018-01-17 9 23 21" src="https://user-images.githubusercontent.com/6477701/35042729-1acaa234-fbcd-11e7-9d3f-4e94dc200e2c.png">



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r161964247
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -224,92 +225,18 @@ def dropGlobalTempView(self, viewName):
             """
             self._jcatalog.dropGlobalTempView(viewName)
     
    -    @ignore_unicode_prefix
    -    @since(2.0)
         def registerFunction(self, name, f, returnType=None):
    -        """Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    -        as a UDF. The registered UDF can be used in SQL statements.
    -
    -        :func:`spark.udf.register` is an alias for :func:`spark.catalog.registerFunction`.
    -
    -        In addition to a name and the function itself, `returnType` can be optionally specified.
    -        1) When f is a Python function, `returnType` defaults to a string. The produced object must
    -        match the specified type. 2) When f is a :class:`UserDefinedFunction`, Spark uses the return
    -        type of the given UDF as the return type of the registered UDF. The input parameter
    -        `returnType` is None by default. If given by users, the value must be None.
    -
    -        :param name: name of the UDF in SQL statements.
    -        :param f: a Python function, or a wrapped/native UserDefinedFunction. The UDF can be either
    -            row-at-a-time or vectorized.
    -        :param returnType: the return type of the registered UDF.
    -        :return: a wrapped/native :class:`UserDefinedFunction`
    -
    -        >>> strlen = spark.catalog.registerFunction("stringLengthString", len)
    -        >>> spark.sql("SELECT stringLengthString('test')").collect()
    -        [Row(stringLengthString(test)=u'4')]
    -
    -        >>> spark.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    -        [Row(stringLengthString(text)=u'3')]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = spark.catalog.registerFunction("stringLengthInt", len, IntegerType())
    -        >>> spark.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
    -        >>> spark.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> from pyspark.sql.functions import udf
    -        >>> slen = udf(lambda s: len(s), IntegerType())
    -        >>> _ = spark.udf.register("slen", slen)
    -        >>> spark.sql("SELECT slen('test')").collect()
    -        [Row(slen(test)=4)]
    -
    -        >>> import random
    -        >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    -        >>> new_random_udf = spark.catalog.registerFunction("random_udf", random_udf)
    -        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    -        [Row(random_udf()=82)]
    -        >>> spark.range(1).select(new_random_udf()).collect()  # doctest: +SKIP
    -        [Row(<lambda>()=26)]
    -
    -        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    -        >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    -        ... def add_one(x):
    -        ...     return x + 1
    -        ...
    -        >>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
    -        >>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    -        [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    -        """
    -
    -        # This is to check whether the input function is a wrapped/native UserDefinedFunction
    -        if hasattr(f, 'asNondeterministic'):
    -            if returnType is not None:
    -                raise TypeError(
    -                    "Invalid returnType: None is expected when f is a UserDefinedFunction, "
    -                    "but got %s." % returnType)
    -            if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
    -                                  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
    -                raise ValueError(
    -                    "Invalid f: f must be either SQL_BATCHED_UDF or SQL_PANDAS_SCALAR_UDF")
    -            register_udf = UserDefinedFunction(f.func, returnType=f.returnType, name=name,
    -                                               evalType=f.evalType,
    -                                               deterministic=f.deterministic)
    -            return_udf = f
    -        else:
    -            if returnType is None:
    -                returnType = StringType()
    -            register_udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                               evalType=PythonEvalType.SQL_BATCHED_UDF)
    -            return_udf = register_udf._wrapped()
    -        self._jsparkSession.udf().registerPython(name, register_udf._judf)
    -        return return_udf
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.register instead.",
    +            DeprecationWarning)
    +        return self._sparkSession.udf.register(name, f, returnType)
    +    # Reuse the docstring from UDFRegistration but with few notes.
    +    registerFunction.__doc__ = """%s
    +        .. note:: :func:`spark.catalog.registerFunction` is an alias
    +            for :func:`spark.udf.register`.
    +        .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.register` instead.
    --- End diff --
    
    Do we have any plan (e.g. 3.0.0) to remove this alias?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86274 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86274/testReport)** for PR 20288 at commit [`4367beb`](https://github.com/apache/spark/commit/4367beb7f165328d2b7357c27ba1e34ddf112825).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86234/testReport)** for PR 20288 at commit [`f63105c`](https://github.com/apache/spark/commit/f63105c7faddc79ccd624c9234b56916efec3569).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class UDFRegistration(object):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    LGTM pending Jenkins.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86266/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86266 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86266/testReport)** for PR 20288 at commit [`c6ed44a`](https://github.com/apache/spark/commit/c6ed44a7e125ff5e86b9734b753c07e7dc82f5a9).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r161965278
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -624,6 +536,9 @@ def _test():
         globs['os'] = os
         globs['sc'] = sc
         globs['sqlContext'] = SQLContext(sc)
    +    # 'spark' alias is a small hack for reusing doctests. Please see the reassignment
    +    # of docstrings above.
    +    globs['spark'] = globs['sqlContext']
    --- End diff --
    
    shall we do `globs['spark'] = globs['sqlContext'].sparkSession`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86306/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86247/testReport)** for PR 20288 at commit [`08438ee`](https://github.com/apache/spark/commit/08438ee7d8c209a2dcb3eb4efeeef77451feb8d7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86274 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86274/testReport)** for PR 20288 at commit [`4367beb`](https://github.com/apache/spark/commit/4367beb7f165328d2b7357c27ba1e34ddf112825).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86264/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86308/testReport)** for PR 20288 at commit [`c9512a6`](https://github.com/apache/spark/commit/c9512a66800709417425c0d348c9327ed681420d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162035316
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -172,113 +173,34 @@ def range(self, start, end=None, step=1, numPartitions=None):
             """
             return self.sparkSession.range(start, end, step, numPartitions)
     
    -    @ignore_unicode_prefix
    -    @since(1.2)
         def registerFunction(self, name, f, returnType=None):
    -        """Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    -        as a UDF. The registered UDF can be used in SQL statements.
    -
    -        :func:`spark.udf.register` is an alias for :func:`sqlContext.registerFunction`.
    -
    -        In addition to a name and the function itself, `returnType` can be optionally specified.
    -        1) When f is a Python function, `returnType` defaults to a string. The produced object must
    -        match the specified type. 2) When f is a :class:`UserDefinedFunction`, Spark uses the return
    -        type of the given UDF as the return type of the registered UDF. The input parameter
    -        `returnType` is None by default. If given by users, the value must be None.
    -
    -        :param name: name of the UDF in SQL statements.
    -        :param f: a Python function, or a wrapped/native UserDefinedFunction. The UDF can be either
    -            row-at-a-time or vectorized.
    -        :param returnType: the return type of the registered UDF.
    -        :return: a wrapped/native :class:`UserDefinedFunction`
    -
    -        >>> strlen = sqlContext.registerFunction("stringLengthString", lambda x: len(x))
    -        >>> sqlContext.sql("SELECT stringLengthString('test')").collect()
    -        [Row(stringLengthString(test)=u'4')]
    -
    -        >>> sqlContext.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    -        [Row(stringLengthString(text)=u'3')]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> from pyspark.sql.functions import udf
    -        >>> slen = udf(lambda s: len(s), IntegerType())
    -        >>> _ = sqlContext.udf.register("slen", slen)
    -        >>> sqlContext.sql("SELECT slen('test')").collect()
    -        [Row(slen(test)=4)]
    -
    -        >>> import random
    -        >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    -        >>> new_random_udf = sqlContext.registerFunction("random_udf", random_udf)
    -        >>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    -        [Row(random_udf()=82)]
    -        >>> sqlContext.range(1).select(new_random_udf()).collect()  # doctest: +SKIP
    -        [Row(<lambda>()=26)]
    -
    -        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    -        >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    -        ... def add_one(x):
    -        ...     return x + 1
    -        ...
    -        >>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: +SKIP
    -        >>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    -        [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    -        """
    -        return self.sparkSession.catalog.registerFunction(name, f, returnType)
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.register instead.",
    +            DeprecationWarning)
    +        return self.sparkSession.udf.register(name, f, returnType)
    +    # Reuse the docstring from UDFRegistration but with few notes.
    +    _register_doc = UDFRegistration.register.__doc__.strip()
    +    registerFunction.__doc__ = """%s
     
    -    @ignore_unicode_prefix
    -    @since(2.1)
    -    def registerJavaFunction(self, name, javaClassName, returnType=None):
    -        """Register a java UDF so it can be used in SQL statements.
    -
    -        In addition to a name and the function itself, the return type can be optionally specified.
    -        When the return type is not specified we would infer it via reflection.
    -        :param name:  name of the UDF
    -        :param javaClassName: fully qualified name of java class
    -        :param returnType: a :class:`pyspark.sql.types.DataType` object
    -
    -        >>> sqlContext.registerJavaFunction("javaStringLength",
    -        ...   "test.org.apache.spark.sql.JavaStringLength", IntegerType())
    -        >>> sqlContext.sql("SELECT javaStringLength('test')").collect()
    -        [Row(UDF:javaStringLength(test)=4)]
    -        >>> sqlContext.registerJavaFunction("javaStringLength2",
    -        ...   "test.org.apache.spark.sql.JavaStringLength")
    -        >>> sqlContext.sql("SELECT javaStringLength2('test')").collect()
    -        [Row(UDF:javaStringLength2(test)=4)]
    +        .. note:: :func:`sqlContext.registerFunction` is an alias for
    +            :func:`spark.udf.register`.
    +        .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.register` instead.
    +        .. versionadded:: 1.2
    +    """ % _register_doc[:_register_doc.rfind('versionadded::')]
    --- End diff --
    
    <img width="654" alt="2018-01-17 9 22 46" src="https://user-images.githubusercontent.com/6477701/35042660-e7b77f52-fbcc-11e7-8304-d36ea8d37daa.png">



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r161964383
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -147,7 +147,8 @@ def udf(self):
     
             :return: :class:`UDFRegistration`
             """
    -        return UDFRegistration(self)
    +        from pyspark.sql.session import UDFRegistration
    --- End diff --
    
    Why we import `UDFRegistration` here again? Isn't it imported at the top?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86247/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162035180
  
    --- Diff: python/pyspark/sql/catalog.py ---
    @@ -224,92 +224,20 @@ def dropGlobalTempView(self, viewName):
             """
             self._jcatalog.dropGlobalTempView(viewName)
     
    -    @ignore_unicode_prefix
    -    @since(2.0)
         def registerFunction(self, name, f, returnType=None):
    -        """Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    -        as a UDF. The registered UDF can be used in SQL statements.
    -
    -        :func:`spark.udf.register` is an alias for :func:`spark.catalog.registerFunction`.
    -
    -        In addition to a name and the function itself, `returnType` can be optionally specified.
    -        1) When f is a Python function, `returnType` defaults to a string. The produced object must
    -        match the specified type. 2) When f is a :class:`UserDefinedFunction`, Spark uses the return
    -        type of the given UDF as the return type of the registered UDF. The input parameter
    -        `returnType` is None by default. If given by users, the value must be None.
    -
    -        :param name: name of the UDF in SQL statements.
    -        :param f: a Python function, or a wrapped/native UserDefinedFunction. The UDF can be either
    -            row-at-a-time or vectorized.
    -        :param returnType: the return type of the registered UDF.
    -        :return: a wrapped/native :class:`UserDefinedFunction`
    -
    -        >>> strlen = spark.catalog.registerFunction("stringLengthString", len)
    -        >>> spark.sql("SELECT stringLengthString('test')").collect()
    -        [Row(stringLengthString(test)=u'4')]
    -
    -        >>> spark.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    -        [Row(stringLengthString(text)=u'3')]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = spark.catalog.registerFunction("stringLengthInt", len, IntegerType())
    -        >>> spark.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
    -        >>> spark.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> from pyspark.sql.functions import udf
    -        >>> slen = udf(lambda s: len(s), IntegerType())
    -        >>> _ = spark.udf.register("slen", slen)
    -        >>> spark.sql("SELECT slen('test')").collect()
    -        [Row(slen(test)=4)]
    -
    -        >>> import random
    -        >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    -        >>> new_random_udf = spark.catalog.registerFunction("random_udf", random_udf)
    -        >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    -        [Row(random_udf()=82)]
    -        >>> spark.range(1).select(new_random_udf()).collect()  # doctest: +SKIP
    -        [Row(<lambda>()=26)]
    -
    -        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    -        >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    -        ... def add_one(x):
    -        ...     return x + 1
    -        ...
    -        >>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
    -        >>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    -        [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    -        """
    -
    -        # This is to check whether the input function is a wrapped/native UserDefinedFunction
    -        if hasattr(f, 'asNondeterministic'):
    -            if returnType is not None:
    -                raise TypeError(
    -                    "Invalid returnType: None is expected when f is a UserDefinedFunction, "
    -                    "but got %s." % returnType)
    -            if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
    -                                  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
    -                raise ValueError(
    -                    "Invalid f: f must be either SQL_BATCHED_UDF or SQL_PANDAS_SCALAR_UDF")
    -            register_udf = UserDefinedFunction(f.func, returnType=f.returnType, name=name,
    -                                               evalType=f.evalType,
    -                                               deterministic=f.deterministic)
    -            return_udf = f
    -        else:
    -            if returnType is None:
    -                returnType = StringType()
    -            register_udf = UserDefinedFunction(f, returnType=returnType, name=name,
    -                                               evalType=PythonEvalType.SQL_BATCHED_UDF)
    -            return_udf = register_udf._wrapped()
    -        self._jsparkSession.udf().registerPython(name, register_udf._judf)
    -        return return_udf
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.register instead.",
    +            DeprecationWarning)
    +        return self._sparkSession.udf.register(name, f, returnType)
    +    # Reuse the docstring from UDFRegistration but with few notes.
    +    _register_doc = UDFRegistration.register.__doc__.strip()
    +    registerFunction.__doc__ = """%s
    +
    +        .. note:: :func:`spark.catalog.registerFunction` is an alias
    +            for :func:`spark.udf.register`.
    +        .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.register` instead.
    +        .. versionadded:: 2.0
    +    """ % _register_doc[:_register_doc.rfind('versionadded::')]
    --- End diff --
    
    <img width="686" alt="2018-01-17 9 21 41" src="https://user-images.githubusercontent.com/6477701/35042642-d3c7aa76-fbcc-11e7-82bc-9f56fc4e9636.png">



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86265/testReport)** for PR 20288 at commit [`f1fe40a`](https://github.com/apache/spark/commit/f1fe40a5afe876cf3b81208af7bc1cd379bcb732).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20288


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162031948
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -172,113 +173,34 @@ def range(self, start, end=None, step=1, numPartitions=None):
             """
             return self.sparkSession.range(start, end, step, numPartitions)
     
    -    @ignore_unicode_prefix
    -    @since(1.2)
         def registerFunction(self, name, f, returnType=None):
    -        """Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    -        as a UDF. The registered UDF can be used in SQL statements.
    -
    -        :func:`spark.udf.register` is an alias for :func:`sqlContext.registerFunction`.
    -
    -        In addition to a name and the function itself, `returnType` can be optionally specified.
    -        1) When f is a Python function, `returnType` defaults to a string. The produced object must
    -        match the specified type. 2) When f is a :class:`UserDefinedFunction`, Spark uses the return
    -        type of the given UDF as the return type of the registered UDF. The input parameter
    -        `returnType` is None by default. If given by users, the value must be None.
    -
    -        :param name: name of the UDF in SQL statements.
    -        :param f: a Python function, or a wrapped/native UserDefinedFunction. The UDF can be either
    -            row-at-a-time or vectorized.
    -        :param returnType: the return type of the registered UDF.
    -        :return: a wrapped/native :class:`UserDefinedFunction`
    -
    -        >>> strlen = sqlContext.registerFunction("stringLengthString", lambda x: len(x))
    -        >>> sqlContext.sql("SELECT stringLengthString('test')").collect()
    -        [Row(stringLengthString(test)=u'4')]
    -
    -        >>> sqlContext.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    -        [Row(stringLengthString(text)=u'3')]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> from pyspark.sql.functions import udf
    -        >>> slen = udf(lambda s: len(s), IntegerType())
    -        >>> _ = sqlContext.udf.register("slen", slen)
    -        >>> sqlContext.sql("SELECT slen('test')").collect()
    -        [Row(slen(test)=4)]
    -
    -        >>> import random
    -        >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    -        >>> new_random_udf = sqlContext.registerFunction("random_udf", random_udf)
    -        >>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    -        [Row(random_udf()=82)]
    -        >>> sqlContext.range(1).select(new_random_udf()).collect()  # doctest: +SKIP
    -        [Row(<lambda>()=26)]
    -
    -        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    -        >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    -        ... def add_one(x):
    -        ...     return x + 1
    -        ...
    -        >>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: +SKIP
    -        >>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    -        [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    -        """
    -        return self.sparkSession.catalog.registerFunction(name, f, returnType)
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.register instead.",
    +            DeprecationWarning)
    +        return self.sparkSession.udf.register(name, f, returnType)
    +    # Reuse the docstring from UDFRegistration but with few notes.
    +    _register_doc = UDFRegistration.register.__doc__.strip()
    +    registerFunction.__doc__ = """%s
     
    -    @ignore_unicode_prefix
    -    @since(2.1)
    -    def registerJavaFunction(self, name, javaClassName, returnType=None):
    -        """Register a java UDF so it can be used in SQL statements.
    -
    -        In addition to a name and the function itself, the return type can be optionally specified.
    -        When the return type is not specified we would infer it via reflection.
    -        :param name:  name of the UDF
    -        :param javaClassName: fully qualified name of java class
    -        :param returnType: a :class:`pyspark.sql.types.DataType` object
    -
    -        >>> sqlContext.registerJavaFunction("javaStringLength",
    -        ...   "test.org.apache.spark.sql.JavaStringLength", IntegerType())
    -        >>> sqlContext.sql("SELECT javaStringLength('test')").collect()
    -        [Row(UDF:javaStringLength(test)=4)]
    -        >>> sqlContext.registerJavaFunction("javaStringLength2",
    -        ...   "test.org.apache.spark.sql.JavaStringLength")
    -        >>> sqlContext.sql("SELECT javaStringLength2('test')").collect()
    -        [Row(UDF:javaStringLength2(test)=4)]
    +        .. note:: :func:`sqlContext.registerFunction` is an alias for
    +            :func:`spark.udf.register`.
    +        .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.register` instead.
    +        .. versionadded:: 1.2
    +    """ % _register_doc[:_register_doc.rfind('versionadded::')]
     
    -        """
    -        jdt = None
    -        if returnType is not None:
    -            jdt = self.sparkSession._jsparkSession.parseDataType(returnType.json())
    -        self.sparkSession._jsparkSession.udf().registerJava(name, javaClassName, jdt)
    +    def registerJavaFunction(self, name, javaClassName, returnType=None):
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.registerJavaFunction instead.",
    +            DeprecationWarning)
    +        return self.sparkSession.udf.registerJavaFunction(name, javaClassName, returnType)
    +    _registerJavaFunction_doc = UDFRegistration.registerJavaFunction.__doc__.strip()
    +    registerJavaFunction.__doc__ = """%s
     
    -    @ignore_unicode_prefix
    -    @since(2.3)
    -    def registerJavaUDAF(self, name, javaClassName):
    --- End diff --
    
    We are fine to remove this one because this is added within 2.3.0 timeline - https://issues.apache.org/jira/browse/SPARK-19439


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86265/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86306/testReport)** for PR 20288 at commit [`3e0147b`](https://github.com/apache/spark/commit/3e0147bd11b980d91a2b628b85c5d6a05391b28e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86264/testReport)** for PR 20288 at commit [`6b9b9c4`](https://github.com/apache/spark/commit/6b9b9c44ea7cafa7e1fb607bcf5a2d19336f31f4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86308/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162031680
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -181,3 +183,180 @@ def asNondeterministic(self):
             """
             self.deterministic = False
             return self
    +
    +
    +class UDFRegistration(object):
    +    """
    +    Wrapper for user-defined function registration.
    +
    +    .. versionadded:: 1.3.1
    +    """
    +
    +    def __init__(self, sparkSession):
    +        self.sparkSession = sparkSession
    +
    +    @ignore_unicode_prefix
    +    @since(1.3)
    +    def register(self, name, f, returnType=None):
    +        """Registers a Python function (including lambda function) or a user-defined function
    +        in SQL statements.
    +
    +        :param name: name of the user-defined function in SQL statements.
    +        :param f: a Python function, or a user-defined function. The user-defined function can
    +            be either row-at-a-time or vectorized. See :meth:`pyspark.sql.functions.udf` and
    +            :meth:`pyspark.sql.functions.pandas_udf`.
    +        :param returnType: the return type of the registered user-defined function.
    +        :return: a user-defined function.
    +
    +        `returnType` can be optionally specified when `f` is a Python function but not
    +        when `f` is a user-defined function. See below:
    +
    +        1. When `f` is a Python function:
    +
    +            `returnType` defaults to string type and can be optionally specified. The produced
    +            object must match the specified type. In this case, this API works as if
    +            `register(name, f, returnType=StringType())`.
    +
    +            >>> strlen = spark.udf.register("stringLengthString", lambda x: len(x))
    +            >>> spark.sql("SELECT stringLengthString('test')").collect()
    +            [Row(stringLengthString(test)=u'4')]
    +
    +            >>> spark.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    +            [Row(stringLengthString(text)=u'3')]
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> _ = spark.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    +            >>> spark.sql("SELECT stringLengthInt('test')").collect()
    +            [Row(stringLengthInt(test)=4)]
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> _ = spark.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    +            >>> spark.sql("SELECT stringLengthInt('test')").collect()
    +            [Row(stringLengthInt(test)=4)]
    +
    +
    +        2. When `f` is a user-defined function:
    +
    +            Spark uses the return type of the given user-defined function as the return type of
    +            the registered user-defined function. `returnType` should not be specified.
    +            In this case, this API works as if `register(name, f)`.
    +
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> from pyspark.sql.functions import udf
    +            >>> slen = udf(lambda s: len(s), IntegerType())
    +            >>> _ = spark.udf.register("slen", slen)
    +            >>> spark.sql("SELECT slen('test')").collect()
    +            [Row(slen(test)=4)]
    +
    +            >>> import random
    +            >>> from pyspark.sql.functions import udf
    +            >>> from pyspark.sql.types import IntegerType
    +            >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    +            >>> new_random_udf = spark.udf.register("random_udf", random_udf)
    +            >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    +            [Row(random_udf()=82)]
    +
    +            >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    +            >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    +            ... def add_one(x):
    +            ...     return x + 1
    +            ...
    +            >>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
    +            >>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    +            [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    +
    +            .. note:: Registration for a user-defined function (case 2.) was added from
    +                Spark 2.3.0.
    +        """
    +
    +        # This is to check whether the input function is from a user-defined function or
    +        # Python function.
    +        if hasattr(f, 'asNondeterministic'):
    +            if returnType is not None:
    +                raise TypeError(
    +                    "Invalid returnType: data type can not be specified when f is"
    +                    "a user-defined function, but got %s." % returnType)
    +            if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
    +                                  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
    +                raise ValueError(
    +                    "Invalid f: f must be either SQL_BATCHED_UDF or SQL_PANDAS_SCALAR_UDF")
    +            register_udf = UserDefinedFunction(f.func, returnType=f.returnType, name=name,
    +                                               evalType=f.evalType,
    +                                               deterministic=f.deterministic)
    +            return_udf = f
    +        else:
    +            if returnType is None:
    +                returnType = StringType()
    +            register_udf = UserDefinedFunction(f, returnType=returnType, name=name,
    +                                               evalType=PythonEvalType.SQL_BATCHED_UDF)
    +            return_udf = register_udf._wrapped()
    +        self.sparkSession._jsparkSession.udf().registerPython(name, register_udf._judf)
    +        return return_udf
    +
    +    @ignore_unicode_prefix
    +    @since(2.3)
    +    def registerJavaFunction(self, name, javaClassName, returnType=None):
    --- End diff --
    
    `registerJavaFunction` and `registerJavaUDAF` look introduced from 2.3.0 - https://issues.apache.org/jira/browse/SPARK-19439


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162035427
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -172,113 +173,34 @@ def range(self, start, end=None, step=1, numPartitions=None):
             """
             return self.sparkSession.range(start, end, step, numPartitions)
     
    -    @ignore_unicode_prefix
    -    @since(1.2)
         def registerFunction(self, name, f, returnType=None):
    -        """Registers a Python function (including lambda function) or a :class:`UserDefinedFunction`
    -        as a UDF. The registered UDF can be used in SQL statements.
    -
    -        :func:`spark.udf.register` is an alias for :func:`sqlContext.registerFunction`.
    -
    -        In addition to a name and the function itself, `returnType` can be optionally specified.
    -        1) When f is a Python function, `returnType` defaults to a string. The produced object must
    -        match the specified type. 2) When f is a :class:`UserDefinedFunction`, Spark uses the return
    -        type of the given UDF as the return type of the registered UDF. The input parameter
    -        `returnType` is None by default. If given by users, the value must be None.
    -
    -        :param name: name of the UDF in SQL statements.
    -        :param f: a Python function, or a wrapped/native UserDefinedFunction. The UDF can be either
    -            row-at-a-time or vectorized.
    -        :param returnType: the return type of the registered UDF.
    -        :return: a wrapped/native :class:`UserDefinedFunction`
    -
    -        >>> strlen = sqlContext.registerFunction("stringLengthString", lambda x: len(x))
    -        >>> sqlContext.sql("SELECT stringLengthString('test')").collect()
    -        [Row(stringLengthString(test)=u'4')]
    -
    -        >>> sqlContext.sql("SELECT 'foo' AS text").select(strlen("text")).collect()
    -        [Row(stringLengthString(text)=u'3')]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> _ = sqlContext.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    -        >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    -        [Row(stringLengthInt(test)=4)]
    -
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> from pyspark.sql.functions import udf
    -        >>> slen = udf(lambda s: len(s), IntegerType())
    -        >>> _ = sqlContext.udf.register("slen", slen)
    -        >>> sqlContext.sql("SELECT slen('test')").collect()
    -        [Row(slen(test)=4)]
    -
    -        >>> import random
    -        >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType
    -        >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic()
    -        >>> new_random_udf = sqlContext.registerFunction("random_udf", random_udf)
    -        >>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: +SKIP
    -        [Row(random_udf()=82)]
    -        >>> sqlContext.range(1).select(new_random_udf()).collect()  # doctest: +SKIP
    -        [Row(<lambda>()=26)]
    -
    -        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    -        >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
    -        ... def add_one(x):
    -        ...     return x + 1
    -        ...
    -        >>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: +SKIP
    -        >>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
    -        [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
    -        """
    -        return self.sparkSession.catalog.registerFunction(name, f, returnType)
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.register instead.",
    +            DeprecationWarning)
    +        return self.sparkSession.udf.register(name, f, returnType)
    +    # Reuse the docstring from UDFRegistration but with few notes.
    +    _register_doc = UDFRegistration.register.__doc__.strip()
    +    registerFunction.__doc__ = """%s
     
    -    @ignore_unicode_prefix
    -    @since(2.1)
    -    def registerJavaFunction(self, name, javaClassName, returnType=None):
    -        """Register a java UDF so it can be used in SQL statements.
    -
    -        In addition to a name and the function itself, the return type can be optionally specified.
    -        When the return type is not specified we would infer it via reflection.
    -        :param name:  name of the UDF
    -        :param javaClassName: fully qualified name of java class
    -        :param returnType: a :class:`pyspark.sql.types.DataType` object
    -
    -        >>> sqlContext.registerJavaFunction("javaStringLength",
    -        ...   "test.org.apache.spark.sql.JavaStringLength", IntegerType())
    -        >>> sqlContext.sql("SELECT javaStringLength('test')").collect()
    -        [Row(UDF:javaStringLength(test)=4)]
    -        >>> sqlContext.registerJavaFunction("javaStringLength2",
    -        ...   "test.org.apache.spark.sql.JavaStringLength")
    -        >>> sqlContext.sql("SELECT javaStringLength2('test')").collect()
    -        [Row(UDF:javaStringLength2(test)=4)]
    +        .. note:: :func:`sqlContext.registerFunction` is an alias for
    +            :func:`spark.udf.register`.
    +        .. note:: Deprecated in 2.3.0. Use :func:`spark.udf.register` instead.
    +        .. versionadded:: 1.2
    +    """ % _register_doc[:_register_doc.rfind('versionadded::')]
     
    -        """
    -        jdt = None
    -        if returnType is not None:
    -            jdt = self.sparkSession._jsparkSession.parseDataType(returnType.json())
    -        self.sparkSession._jsparkSession.udf().registerJava(name, javaClassName, jdt)
    +    def registerJavaFunction(self, name, javaClassName, returnType=None):
    +        warnings.warn(
    +            "Deprecated in 2.3.0. Use spark.udf.registerJavaFunction instead.",
    +            DeprecationWarning)
    +        return self.sparkSession.udf.registerJavaFunction(name, javaClassName, returnType)
    +    _registerJavaFunction_doc = UDFRegistration.registerJavaFunction.__doc__.strip()
    +    registerJavaFunction.__doc__ = """%s
    --- End diff --
    
    <img width="699" alt="2018-01-17 9 22 57" src="https://user-images.githubusercontent.com/6477701/35042684-f835dbe4-fbcc-11e7-8b58-360c25f8e291.png">



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    @ueshin and @icexelloss (docstring reassignment) and @cloud-fan (deprecation), could you guys take a look and see if I understood your suggestions correctly?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    **[Test build #86266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86266/testReport)** for PR 20288 at commit [`c6ed44a`](https://github.com/apache/spark/commit/c6ed44a7e125ff5e86b9734b753c07e7dc82f5a9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs ...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20288
  
    LGTM pending Jenkins.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20288#discussion_r162448507
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -181,3 +183,179 @@ def asNondeterministic(self):
             """
             self.deterministic = False
             return self
    +
    +
    +class UDFRegistration(object):
    +    """
    +    Wrapper for user-defined function registration. This instance can be accessed by
    +    :attr:`spark.udf` or :attr:`sqlContext.udf`.
    +
    +    .. versionadded:: 1.3.1
    +    """
    +
    +    def __init__(self, sparkSession):
    +        self.sparkSession = sparkSession
    +
    +    @ignore_unicode_prefix
    +    @since("1.3.1")
    +    def register(self, name, f, returnType=None):
    +        """Registers a Python function (including lambda function) or a user-defined function
    +        in SQL statements.
    +
    +        :param name: name of the user-defined function in SQL statements.
    +        :param f: a Python function, or a user-defined function. The user-defined function can
    +            be either row-at-a-time or vectorized. See :meth:`pyspark.sql.functions.udf` and
    +            :meth:`pyspark.sql.functions.pandas_udf`.
    +        :param returnType: the return type of the registered user-defined function.
    +        :return: a user-defined function.
    +
    +        `returnType` can be optionally specified when `f` is a Python function but not
    +        when `f` is a user-defined function. Please see below.
    --- End diff --
    
    Could you add another paragraph for explaining how to register a non-deterministic Python function? This sounds a common question from end users.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org