You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/03 17:26:54 UTC

[GitHub] [spark] peter-toth commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls

peter-toth commented on a change in pull request #30203:
URL: https://github.com/apache/spark/pull/30203#discussion_r516836683



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##########
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with PredicateHelper {
     }
   }
 
+  private def canonicalizeDeterministic(u: PythonUDF) = {

Review comment:
       I think @cloud-fan was referring to that if we change the default to non-deterministic then it means that some of the optimization rules will not handle those UDF expressions and will leave them untouched. E.g. `PushDownPredicates` will not push them down, which can cause performance regression.
   
   IMHO, it is the user's responsibility to set the deterministic flag right regardless what is the default. And if a UDF is flagged deterministic we should do the optimizations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org