You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/12/02 02:16:02 UTC

[GitHub] spark pull request #23153: [SPARK-26147][SQL] only pull out unevaluable pyth...

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23153#discussion_r238082589
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala ---
    @@ -155,19 +155,20 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper {
     }
     
     /**
    - * PythonUDF in join condition can not be evaluated, this rule will detect the PythonUDF
    - * and pull them out from join condition. For python udf accessing attributes from only one side,
    - * they are pushed down by operation push down rules. If not (e.g. user disables filter push
    - * down rules), we need to pull them out in this rule too.
    + * PythonUDF in join condition can't be evaluated if it refers to attributes from both join sides.
    + * See `ExtractPythonUDFs` for details. This rule will detect un-evaluable PythonUDF and pull them
    + * out from join condition.
      */
     object PullOutPythonUDFInJoinCondition extends Rule[LogicalPlan] with PredicateHelper {
    -  def hasPythonUDF(expression: Expression): Boolean = {
    -    expression.collectFirst { case udf: PythonUDF => udf }.isDefined
    +
    +  private def hasUnevaluablePythonUDF(expr: Expression, j: Join): Boolean = {
    +    expr.find { e =>
    +      PythonUDF.isScalarPythonUDF(e) && !canEvaluate(e, j.left) && !canEvaluate(e, j.right)
    --- End diff --
    
    It's only possible to have scalar UDF in join condition, so changing it to `e.isInstanceOf[PythonUDF]` is same.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org