You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2017/10/31 04:31:30 UTC

[GitHub] spark pull request #19617: [SPARK-22347][PySpark][DOC] Add document to notic...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/19617

    [SPARK-22347][PySpark][DOC] Add document to notice users for using udfs with conditional expressions

    ## What changes were proposed in this pull request?
    
    Under the current execution mode of Python UDFs, we don't well support Python UDFs as branch values or else value in CaseWhen expression.
    
    Since to fix it might need the change not small and this issue has simpler workaround. We should just notice users in the document about this.
    
    ## How was this patch tested?
    
    Only document change.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-22347-3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19617.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19617
    
----
commit a43430b99d0e5aab351467386fe566461b2a4b06
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2017-10-31T04:28:16Z

    Add document to notice users for using udfs with conditional expressions.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    **[Test build #83261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83261/testReport)** for PR 19617 at commit [`47bb26e`](https://github.com/apache/spark/commit/47bb26e9fe5a9db8cf666ae16a29d82a3e5e6311).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    cc @cloud-fan for checking the document too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83261/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    **[Test build #83242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83242/testReport)** for PR 19617 at commit [`a43430b`](https://github.com/apache/spark/commit/a43430b99d0e5aab351467386fe566461b2a4b06).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    cc @HyukjinKwon @BryanCutler 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19617: [SPARK-22347][PySpark][DOC] Add document to notic...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19617#discussion_r147995905
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2185,6 +2185,12 @@ def udf(f=None, returnType=StringType()):
             duplicate invocations may be eliminated or the function may even be invoked more times than
             it is present in the query.
     
    +    .. note:: The user-defined functions do not support conditional execution by using them with
    +        SQL conditional expressions such `when` or `if`. The functions still apply on all rows no
    --- End diff --
    
    Oops! Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83259/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83241/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83260/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19617: [SPARK-22347][PySpark][DOC] Add document to notic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19617#discussion_r147961260
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2185,6 +2185,12 @@ def udf(f=None, returnType=StringType()):
             duplicate invocations may be eliminated or the function may even be invoked more times than
             it is present in the query.
     
    +    .. note:: The user-defined functions do not support conditional execution by using them with
    +        SQL conditional expressions such `when` or `if`. The functions still apply on all rows no
    --- End diff --
    
    Looks a tiny typo `` expressions such `when` or `if`. `` -> `` expressions such as `when` or `if`. ``.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19617: [SPARK-22347][PySpark][DOC] Add document to notic...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19617


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    **[Test build #83242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83242/testReport)** for PR 19617 at commit [`a43430b`](https://github.com/apache/spark/commit/a43430b99d0e5aab351467386fe566461b2a4b06).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19617: [SPARK-22347][PySpark][DOC] Add document to notic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19617#discussion_r147964546
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2185,6 +2185,12 @@ def udf(f=None, returnType=StringType()):
             duplicate invocations may be eliminated or the function may even be invoked more times than
             it is present in the query.
     
    +    .. note:: The user-defined functions do not support conditional execution by using them with
    --- End diff --
    
    Hm, should we maybe clarify the output itself is correct if it does not cause the runtime failure by the condition? Maybe I am too much worried but think it might mislead the output is incorrect at all.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83242/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19617: [SPARK-22347][PySpark][DOC] Add document to notic...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19617#discussion_r147995716
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2185,6 +2185,12 @@ def udf(f=None, returnType=StringType()):
             duplicate invocations may be eliminated or the function may even be invoked more times than
             it is present in the query.
     
    +    .. note:: The user-defined functions do not support conditional execution by using them with
    --- End diff --
    
    Yes. I think this is a valid worry. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    **[Test build #83261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83261/testReport)** for PR 19617 at commit [`47bb26e`](https://github.com/apache/spark/commit/47bb26e9fe5a9db8cf666ae16a29d82a3e5e6311).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19617
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org