You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2018/01/18 06:18:51 UTC

[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/20307

    [SPARK-23141][SQL][PYSPARK] Support data type string as a returnType for registerJavaFunction.

    ## What changes were proposed in this pull request?
    
    Currently `UDFRegistration.registerJavaFunction` doesn't support data type string as a `returnType` whereas `UDFRegistration.register`, `@udf`, or `@pandas_udf` does.
    We can support it for `UDFRegistration.registerJavaFunction` as well.
    
    ## How was this patch tested?
    
    Added a doctest and existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-23141

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20307.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20307
    
----
commit 1a2c01d84315e8937f0683680dd81dec5a4a3a6f
Author: Takuya UESHIN <ue...@...>
Date:   2018-01-18T06:04:30Z

    Support data type string as a returnType for registerJavaFunction.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20307#discussion_r162308708
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2108,7 +2108,8 @@ def udf(f=None, returnType=StringType()):
             can fail on special rows, the workaround is to incorporate the condition into the functions.
     
         :param f: python function if used as a standalone function
    -    :param returnType: a :class:`pyspark.sql.types.DataType` object
    +    :param returnType: the return type of the registered user-defined function. The value can be
    --- End diff --
    
    Seems typo: `the return type of the registered user-defined function.` -> `the return type of the user-defined function.`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20307


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86317/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20307#discussion_r162260194
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -310,14 +310,22 @@ def registerJavaFunction(self, name, javaClassName, returnType=None):
             ...     "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
             >>> spark.sql("SELECT javaStringLength('test')").collect()
             [Row(UDF:javaStringLength(test)=4)]
    +
             >>> spark.udf.registerJavaFunction(
             ...     "javaStringLength2", "test.org.apache.spark.sql.JavaStringLength")
             >>> spark.sql("SELECT javaStringLength2('test')").collect()
             [Row(UDF:javaStringLength2(test)=4)]
    +
    +        >>> spark.udf.registerJavaFunction(
    +        ...     "javaStringLength3", "test.org.apache.spark.sql.JavaStringLength", "integer")
    +        >>> spark.sql("SELECT javaStringLength3('test')").collect()
    +        [Row(UDF:javaStringLength3(test)=4)]
             """
     
             jdt = None
             if returnType is not None:
    +            if not isinstance(returnType, DataType):
    +                returnType = _parse_datatype_string(returnType)
    --- End diff --
    
    The param doc needs to be modified too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    cc @HyukjinKwon 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20307#discussion_r162259997
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -310,14 +310,22 @@ def registerJavaFunction(self, name, javaClassName, returnType=None):
             ...     "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
    --- End diff --
    
    Sure, I'll update them here soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20307#discussion_r162329537
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2108,7 +2108,8 @@ def udf(f=None, returnType=StringType()):
             can fail on special rows, the workaround is to incorporate the condition into the functions.
     
         :param f: python function if used as a standalone function
    -    :param returnType: a :class:`pyspark.sql.types.DataType` object
    +    :param returnType: the return type of the registered user-defined function. The value can be
    --- End diff --
    
    Oops, I'll fix it. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    **[Test build #86317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86317/testReport)** for PR 20307 at commit [`1a2c01d`](https://github.com/apache/spark/commit/1a2c01d84315e8937f0683680dd81dec5a4a3a6f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    **[Test build #86339 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86339/testReport)** for PR 20307 at commit [`c731876`](https://github.com/apache/spark/commit/c7318763fea11d2615df1365960b72ea83fe94dc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    Merged to master and branch-2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    **[Test build #86322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86322/testReport)** for PR 20307 at commit [`d41709f`](https://github.com/apache/spark/commit/d41709fc33640b3015b07f308da813b2becdb091).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20307#discussion_r162258962
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -310,14 +310,22 @@ def registerJavaFunction(self, name, javaClassName, returnType=None):
             ...     "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
    --- End diff --
    
    Ah, seems we need to fix `:param returnType:` across all other related APIs saying it takes DDL-formatted type string. 
    
    @ueshin, mind opening a minor PR for this - `udf`, `pandas_udf`, `registerJavaFunction` and `register`  separately? If you are busy, will do it tonight. Doing this here is fine to me too, up to you.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    **[Test build #86317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86317/testReport)** for PR 20307 at commit [`1a2c01d`](https://github.com/apache/spark/commit/1a2c01d84315e8937f0683680dd81dec5a4a3a6f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86322/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    **[Test build #86322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86322/testReport)** for PR 20307 at commit [`d41709f`](https://github.com/apache/spark/commit/d41709fc33640b3015b07f308da813b2becdb091).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86339/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20307
  
    **[Test build #86339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86339/testReport)** for PR 20307 at commit [`c731876`](https://github.com/apache/spark/commit/c7318763fea11d2615df1365960b72ea83fe94dc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20307#discussion_r162260542
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -310,14 +310,22 @@ def registerJavaFunction(self, name, javaClassName, returnType=None):
             ...     "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
             >>> spark.sql("SELECT javaStringLength('test')").collect()
             [Row(UDF:javaStringLength(test)=4)]
    +
             >>> spark.udf.registerJavaFunction(
             ...     "javaStringLength2", "test.org.apache.spark.sql.JavaStringLength")
             >>> spark.sql("SELECT javaStringLength2('test')").collect()
             [Row(UDF:javaStringLength2(test)=4)]
    +
    +        >>> spark.udf.registerJavaFunction(
    +        ...     "javaStringLength3", "test.org.apache.spark.sql.JavaStringLength", "integer")
    +        >>> spark.sql("SELECT javaStringLength3('test')").collect()
    +        [Row(UDF:javaStringLength3(test)=4)]
             """
     
             jdt = None
             if returnType is not None:
    +            if not isinstance(returnType, DataType):
    +                returnType = _parse_datatype_string(returnType)
    --- End diff --
    
    Yup, that's https://github.com/apache/spark/pull/20307#discussion_r162258962 :).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org