You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/07/19 14:36:45 UTC

[GitHub] [spark] viirya opened a new pull request #25204: [SPARK-28441][SQL][Python] Fix error when PythonUDF is used in correlated scalar subquery

viirya opened a new pull request #25204: [SPARK-28441][SQL][Python] Fix error when PythonUDF is used in correlated scalar subquery
URL: https://github.com/apache/spark/pull/25204
 
 
   ## What changes were proposed in this pull request?
   
   In SPARK-15370, We checked the expression at the root of the correlated subquery, in order to fix count bug. If a `PythonUDF` in in the checking path, evaluating it causes the failure as we can't statically evaluate `PythonUDF`. The Python UDF test added at SPARK-28277 shows this issue.
   
   If we can statically evaluate the expression, we intercept NULL values coming from the outer join and replace them with the value that the subquery's expression like before, if it is not, we replace them with the `PythonUDF` expression, with statically evaluated parameters.
   
   After this, the last query in `udf-except.sql` which throws `java.lang.UnsupportedOperationException` can be run:
   
   ```
   SELECT t1.k
   FROM   t1
   WHERE  t1.v <= (SELECT   udf(max(udf(t2.v)))
                   FROM     t2
                   WHERE    udf(t2.k) = udf(t1.k))
   MINUS
   SELECT t1.k
   FROM   t1
   WHERE  udf(t1.v) >= (SELECT   min(udf(t2.v))
                   FROM     t2
                   WHERE    t2.k = t1.k)
   -- !query 2 schema
   struct<k:string>
   -- !query 2 output
   two
   ```
   
   ## How was this patch tested?
   
   Added tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org