You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2018/01/18 06:18:51 UTC
[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/20307
[SPARK-23141][SQL][PYSPARK] Support data type string as a returnType for registerJavaFunction.
## What changes were proposed in this pull request?
Currently `UDFRegistration.registerJavaFunction` doesn't support data type string as a `returnType` whereas `UDFRegistration.register`, `@udf`, or `@pandas_udf` does.
We can support it for `UDFRegistration.registerJavaFunction` as well.
## How was this patch tested?
Added a doctest and existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ueshin/apache-spark issues/SPARK-23141
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20307.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20307
----
commit 1a2c01d84315e8937f0683680dd81dec5a4a3a6f
Author: Takuya UESHIN <ue...@...>
Date: 2018-01-18T06:04:30Z
Support data type string as a returnType for registerJavaFunction.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20307#discussion_r162308708
--- Diff: python/pyspark/sql/functions.py ---
@@ -2108,7 +2108,8 @@ def udf(f=None, returnType=StringType()):
can fail on special rows, the workaround is to incorporate the condition into the functions.
:param f: python function if used as a standalone function
- :param returnType: a :class:`pyspark.sql.types.DataType` object
+ :param returnType: the return type of the registered user-defined function. The value can be
--- End diff --
Seems typo: `the return type of the registered user-defined function.` -> `the return type of the user-defined function.`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20307
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20307
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86317/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20307#discussion_r162260194
--- Diff: python/pyspark/sql/udf.py ---
@@ -310,14 +310,22 @@ def registerJavaFunction(self, name, javaClassName, returnType=None):
... "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
>>> spark.sql("SELECT javaStringLength('test')").collect()
[Row(UDF:javaStringLength(test)=4)]
+
>>> spark.udf.registerJavaFunction(
... "javaStringLength2", "test.org.apache.spark.sql.JavaStringLength")
>>> spark.sql("SELECT javaStringLength2('test')").collect()
[Row(UDF:javaStringLength2(test)=4)]
+
+ >>> spark.udf.registerJavaFunction(
+ ... "javaStringLength3", "test.org.apache.spark.sql.JavaStringLength", "integer")
+ >>> spark.sql("SELECT javaStringLength3('test')").collect()
+ [Row(UDF:javaStringLength3(test)=4)]
"""
jdt = None
if returnType is not None:
+ if not isinstance(returnType, DataType):
+ returnType = _parse_datatype_string(returnType)
--- End diff --
The param doc needs to be modified too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/20307
cc @HyukjinKwon
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20307#discussion_r162259997
--- Diff: python/pyspark/sql/udf.py ---
@@ -310,14 +310,22 @@ def registerJavaFunction(self, name, javaClassName, returnType=None):
... "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
--- End diff --
Sure, I'll update them here soon.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20307#discussion_r162329537
--- Diff: python/pyspark/sql/functions.py ---
@@ -2108,7 +2108,8 @@ def udf(f=None, returnType=StringType()):
can fail on special rows, the workaround is to incorporate the condition into the functions.
:param f: python function if used as a standalone function
- :param returnType: a :class:`pyspark.sql.types.DataType` object
+ :param returnType: the return type of the registered user-defined function. The value can be
--- End diff --
Oops, I'll fix it. Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20307
**[Test build #86317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86317/testReport)** for PR 20307 at commit [`1a2c01d`](https://github.com/apache/spark/commit/1a2c01d84315e8937f0683680dd81dec5a4a3a6f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20307
**[Test build #86339 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86339/testReport)** for PR 20307 at commit [`c731876`](https://github.com/apache/spark/commit/c7318763fea11d2615df1365960b72ea83fe94dc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20307
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20307
Merged to master and branch-2.3.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20307
**[Test build #86322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86322/testReport)** for PR 20307 at commit [`d41709f`](https://github.com/apache/spark/commit/d41709fc33640b3015b07f308da813b2becdb091).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20307#discussion_r162258962
--- Diff: python/pyspark/sql/udf.py ---
@@ -310,14 +310,22 @@ def registerJavaFunction(self, name, javaClassName, returnType=None):
... "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
--- End diff --
Ah, seems we need to fix `:param returnType:` across all other related APIs saying it takes DDL-formatted type string.
@ueshin, mind opening a minor PR for this - `udf`, `pandas_udf`, `registerJavaFunction` and `register` separately? If you are busy, will do it tonight. Doing this here is fine to me too, up to you.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20307
**[Test build #86317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86317/testReport)** for PR 20307 at commit [`1a2c01d`](https://github.com/apache/spark/commit/1a2c01d84315e8937f0683680dd81dec5a4a3a6f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20307
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86322/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20307
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20307
**[Test build #86322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86322/testReport)** for PR 20307 at commit [`d41709f`](https://github.com/apache/spark/commit/d41709fc33640b3015b07f308da813b2becdb091).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20307
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20307
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86339/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20307: [SPARK-23141][SQL][PYSPARK] Support data type string as ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20307
**[Test build #86339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86339/testReport)** for PR 20307 at commit [`c731876`](https://github.com/apache/spark/commit/c7318763fea11d2615df1365960b72ea83fe94dc).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20307#discussion_r162260542
--- Diff: python/pyspark/sql/udf.py ---
@@ -310,14 +310,22 @@ def registerJavaFunction(self, name, javaClassName, returnType=None):
... "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
>>> spark.sql("SELECT javaStringLength('test')").collect()
[Row(UDF:javaStringLength(test)=4)]
+
>>> spark.udf.registerJavaFunction(
... "javaStringLength2", "test.org.apache.spark.sql.JavaStringLength")
>>> spark.sql("SELECT javaStringLength2('test')").collect()
[Row(UDF:javaStringLength2(test)=4)]
+
+ >>> spark.udf.registerJavaFunction(
+ ... "javaStringLength3", "test.org.apache.spark.sql.JavaStringLength", "integer")
+ >>> spark.sql("SELECT javaStringLength3('test')").collect()
+ [Row(UDF:javaStringLength3(test)=4)]
"""
jdt = None
if returnType is not None:
+ if not isinstance(returnType, DataType):
+ returnType = _parse_datatype_string(returnType)
--- End diff --
Yup, that's https://github.com/apache/spark/pull/20307#discussion_r162258962 :).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org