You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2018/01/23 07:14:28 UTC

[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/20360

    [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs from aggregate

    ## What changes were proposed in this pull request?
    
    We extract Python UDFs in logical aggregate which depends on aggregate expression or grouping key in ExtractPythonUDFFromAggregate rule. But Python UDFs which don't depend on above expressions should also be extracted to avoid the issue reported in the JIRA.
    
    A small code snippet to reproduce that issue looks like:
    ```python
    import pyspark.sql.functions as f
    
    df = spark.createDataFrame([(1,2), (3,4)])
    f_udf = f.udf(lambda: str("const_str"))
    df2 = df.distinct().withColumn("a", f_udf())
    df2.show()
    ```
    
    Error exception is raised as:
    ```
    : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: pythonUDF0#50
            at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
            at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91)
            at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:90)
            at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
            at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
            at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
            at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
            at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
            at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
            at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
            at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
            at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
            at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
            at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
            at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:90)
            at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$38.apply(HashAggregateExec.scala:514)
            at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$38.apply(HashAggregateExec.scala:513)
    ```
    
    This exception raises because `HashAggregateExec` tries to bind the aliased Python UDF expression (e.g., `pythonUDF0#50 AS a#44`) to grouping key.
    
    ## How was this patch tested?
    
    Added test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-23177

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20360.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20360
    
----
commit b6cb6218e539589f37ff8648dff068bef6e810e5
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-01-23T05:56:45Z

    Extract parameter-less UDFs from aggregate.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/135/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/130/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20360#discussion_r163235275
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala ---
    @@ -45,7 +45,8 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] {
     
       private def hasPythonUdfOverAggregate(expr: Expression, agg: Aggregate): Boolean = {
         expr.find {
    -      e => PythonUDF.isScalarPythonUDF(e) && e.find(belongAggregate(_, agg)).isDefined
    --- End diff --
    
    shall we update the classdoc too? it currently says `Extracts all the Python UDFs in logical aggregate, which depends on aggregate expression or grouping key, evaluate them after aggregate`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    **[Test build #86551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86551/testReport)** for PR 20360 at commit [`74684a7`](https://github.com/apache/spark/commit/74684a7d10009ef970d7d674d9c695b695c5da5c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    **[Test build #86523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86523/testReport)** for PR 20360 at commit [`5c3afbb`](https://github.com/apache/spark/commit/5c3afbbdf762411023b06348b2bfe3dbc2ff4287).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20360#discussion_r163223074
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala ---
    @@ -45,7 +45,8 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] {
     
       private def hasPythonUdfOverAggregate(expr: Expression, agg: Aggregate): Boolean = {
         expr.find {
    -      e => PythonUDF.isScalarPythonUDF(e) && e.find(belongAggregate(_, agg)).isDefined
    +      e => PythonUDF.isScalarPythonUDF(e) &&
    +        (e.references.isEmpty || e.find(belongAggregate(_, agg)).isDefined)
    --- End diff --
    
    I just want to consider some literal inputs like `df2 = df.distinct().withColumn("a", f_udf(f.lit("2")))`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86551/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    @HyukjinKwon Ok. I will open a backport later.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86523/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/159/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20360#discussion_r163228866
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala ---
    @@ -45,7 +45,8 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] {
     
       private def hasPythonUdfOverAggregate(expr: Expression, agg: Aggregate): Boolean = {
         expr.find {
    -      e => PythonUDF.isScalarPythonUDF(e) && e.find(belongAggregate(_, agg)).isDefined
    +      e => PythonUDF.isScalarPythonUDF(e) &&
    +        (e.references.isEmpty || e.find(belongAggregate(_, agg)).isDefined)
    --- End diff --
    
    I think `references` is more correct - If we use `children` and then we could have an expression, for example, a literal, we would not try this extraction, ending up with leaving `HashAggregateExec` still binding to the Python UDF because I think its `children` is non-empty but it doesn't belong to the aggregate expression in this case whereas `references` remains empty.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by hankim <gi...@git.apache.org>.

Github user hankim commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    is there any workaround for this? my environment hasn't upgrade to 2.3.0, but I have exact code that jira ticket has. (http://mail-archives.apache.org/mod_mbox/spark-issues/201801.mbox/%3CJIRA.13132665.1516622460000.6681.1516622520346@Atlassian.JIRA%3E) 
    i.e., assigning uuid after distinct() call with udf.
    Thank you!
    cc @viirya @HyukjinKwon @cloud-fan


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20360#discussion_r163229202
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala ---
    @@ -45,7 +45,8 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] {
     
       private def hasPythonUdfOverAggregate(expr: Expression, agg: Aggregate): Boolean = {
         expr.find {
    -      e => PythonUDF.isScalarPythonUDF(e) && e.find(belongAggregate(_, agg)).isDefined
    +      e => PythonUDF.isScalarPythonUDF(e) &&
    +        (e.references.isEmpty || e.find(belongAggregate(_, agg)).isDefined)
    --- End diff --
    
    Sorry, I wrote a duplicate comment and removed it back. It didn't show up when I write ..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20360#discussion_r163189081
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala ---
    @@ -45,7 +45,8 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] {
     
       private def hasPythonUdfOverAggregate(expr: Expression, agg: Aggregate): Boolean = {
         expr.find {
    -      e => PythonUDF.isScalarPythonUDF(e) && e.find(belongAggregate(_, agg)).isDefined
    +      e => PythonUDF.isScalarPythonUDF(e) &&
    +        (e.references.isEmpty || e.find(belongAggregate(_, agg)).isDefined)
    --- End diff --
    
    Can we use just `e.children` instead of `e.references`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20360


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    @hankim maybe like:
    
    ```
    import pyspark.sql.functions as f
    import uuid
    
    df = spark.createDataFrame([(1,2), (3,4)])
    f_udf = f.udf(lambda: str(uuid.uuid4()))
    df2 = df.distinct().cache()
    df3 = df2.withColumn("a", f_udf()).show()
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    **[Test build #86551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86551/testReport)** for PR 20360 at commit [`74684a7`](https://github.com/apache/spark/commit/74684a7d10009ef970d7d674d9c695b695c5da5c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/132/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    cc @HyukjinKwon @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86518/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86520/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    **[Test build #86518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86518/testReport)** for PR 20360 at commit [`b6cb621`](https://github.com/apache/spark/commit/b6cb6218e539589f37ff8648dff068bef6e810e5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20360#discussion_r163234660
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala ---
    @@ -45,7 +45,8 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] {
     
       private def hasPythonUdfOverAggregate(expr: Expression, agg: Aggregate): Boolean = {
         expr.find {
    -      e => PythonUDF.isScalarPythonUDF(e) && e.find(belongAggregate(_, agg)).isDefined
    +      e => PythonUDF.isScalarPythonUDF(e) &&
    +        (e.references.isEmpty || e.find(belongAggregate(_, agg)).isDefined)
    --- End diff --
    
    Oh, I see, sounds good. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    @viirya, mind if I ask to open a backport to branch-2.3?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20360#discussion_r163410447
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala ---
    @@ -45,7 +45,8 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] {
     
       private def hasPythonUdfOverAggregate(expr: Expression, agg: Aggregate): Boolean = {
         expr.find {
    -      e => PythonUDF.isScalarPythonUDF(e) && e.find(belongAggregate(_, agg)).isDefined
    --- End diff --
    
    Yes. Updated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    **[Test build #86523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86523/testReport)** for PR 20360 at commit [`5c3afbb`](https://github.com/apache/spark/commit/5c3afbbdf762411023b06348b2bfe3dbc2ff4287).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    **[Test build #86518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86518/testReport)** for PR 20360 at commit [`b6cb621`](https://github.com/apache/spark/commit/b6cb6218e539589f37ff8648dff068bef6e810e5).
     * This patch **fails Python style tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    **[Test build #86520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86520/testReport)** for PR 20360 at commit [`5c3afbb`](https://github.com/apache/spark/commit/5c3afbbdf762411023b06348b2bfe3dbc2ff4287).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20360
  
    **[Test build #86520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86520/testReport)** for PR 20360 at commit [`5c3afbb`](https://github.com/apache/spark/commit/5c3afbbdf762411023b06348b2bfe3dbc2ff4287).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org