You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/20 21:02:39 UTC

[GitHub] [spark] gengliangwang commented on a diff in pull request #39142: [SPARK-41633][SQL] Identify aggregation expressions in the nodePatterns of PythonUDF

gengliangwang commented on code in PR #39142:
URL: https://github.com/apache/spark/pull/39142#discussion_r1053745789


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala:
##########
@@ -64,7 +64,16 @@ case class PythonUDF(
 
   override def toString: String = s"$name(${children.mkString(", ")})#${resultId.id}$typeSuffix"
 
-  final override val nodePatterns: Seq[TreePattern] = Seq(PYTHON_UDF)
+  private def nodePatternsOfPythonFunction: Option[TreePattern] = {

Review Comment:
   Good point. I just did a simple refactoring on the method `PythonUDF.isGroupedAggPandasUDFEvalType` to check whether an eval type is a pandas aggregation function. 
   I added a comment for developers as well.
   
   I asked @sigmod a similar question while reviewing PRs under https://issues.apache.org/jira/browse/SPARK-35042. The answer is developer needs to maintain the node patterns and create more test cases to make sure no regressions happens.
   
   We can add a test case to go over all the `PythonEvalType` as well. But given it is written in the following way
   ```
   private[spark] object PythonEvalType {
     val NON_UDF = 0
   
     val SQL_BATCHED_UDF = 100
   
     val SQL_SCALAR_PANDAS_UDF = 200
     val SQL_GROUPED_MAP_PANDAS_UDF = 201
     val SQL_GROUPED_AGG_PANDAS_UDF = 202
     val SQL_WINDOW_AGG_PANDAS_UDF = 203
     val SQL_SCALAR_PANDAS_ITER_UDF = 204
     val SQL_MAP_PANDAS_ITER_UDF = 205
     val SQL_COGROUPED_MAP_PANDAS_UDF = 206
     val SQL_MAP_ARROW_ITER_UDF = 207
     val SQL_GROUPED_MAP_PANDAS_UDF_WITH_STATE = 208
   ```
   I am still thinking about how to refactor and write tests to enumerate all the types for future-proofing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org