You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/12/02 02:16:02 UTC
[GitHub] spark pull request #23153: [SPARK-26147][SQL] only pull out unevaluable pyth...
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/23153#discussion_r238082589
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala ---
@@ -155,19 +155,20 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper {
}
/**
- * PythonUDF in join condition can not be evaluated, this rule will detect the PythonUDF
- * and pull them out from join condition. For python udf accessing attributes from only one side,
- * they are pushed down by operation push down rules. If not (e.g. user disables filter push
- * down rules), we need to pull them out in this rule too.
+ * PythonUDF in join condition can't be evaluated if it refers to attributes from both join sides.
+ * See `ExtractPythonUDFs` for details. This rule will detect un-evaluable PythonUDF and pull them
+ * out from join condition.
*/
object PullOutPythonUDFInJoinCondition extends Rule[LogicalPlan] with PredicateHelper {
- def hasPythonUDF(expression: Expression): Boolean = {
- expression.collectFirst { case udf: PythonUDF => udf }.isDefined
+
+ private def hasUnevaluablePythonUDF(expr: Expression, j: Join): Boolean = {
+ expr.find { e =>
+ PythonUDF.isScalarPythonUDF(e) && !canEvaluate(e, j.left) && !canEvaluate(e, j.right)
--- End diff --
It's only possible to have scalar UDF in join condition, so changing it to `e.isInstanceOf[PythonUDF]` is same.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org