You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/30 21:03:04 UTC

[GitHub] [spark] viirya commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

viirya commented on a change in pull request #32407:
URL: https://github.com/apache/spark/pull/32407#discussion_r624203049



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
##########
@@ -23,39 +23,76 @@
 /**
  * Interface for a function that produces a result value for each input row.
  * <p>
- * To evaluate each input row, Spark will first try to lookup and use a "magic method" (described
- * below) through Java reflection. If the method is not found, Spark will call
- * {@link #produceResult(InternalRow)} as a fallback approach.
+ * To evaluate each input row, Spark will first try to lookup and use either a static or
+ * non-static "magic method" (described below) through Java reflection. If neither of the
+ * magic methods is not found, Spark will call {@link #produceResult(InternalRow)} as a fallback
+ * approach. In other words, the precedence is as follow:
+ * <ul>
+ *   <li>static magic method</li>
+ *   <li>non-static magic method</li>
+ *   <li>{@link #produceResult(InternalRow)}</li>
+ * </ul>
  * <p>
  * The JVM type of result values produced by this function must be the type used by Spark's
  * InternalRow API for the {@link DataType SQL data type} returned by {@link #resultType()}.
+ * The mapping between {@link DataType} and the corresponding JVM type is defined below.
  * <p>
  * <b>IMPORTANT</b>: the default implementation of {@link #produceResult} throws
  * {@link UnsupportedOperationException}. Users can choose to override this method, or implement
- * a "magic method" with name {@link #MAGIC_METHOD_NAME} which takes individual parameters
- * instead of a {@link InternalRow}. The magic method will be loaded by Spark through Java
- * reflection and will also provide better performance in general, due to optimizations such as
- * codegen, removal of Java boxing, etc.
- *
- * For example, a scalar UDF for adding two integers can be defined as follow with the magic
+ * a static magic method with name {@link #STATIC_MAGIC_METHOD_NAME}, or non-static magic

Review comment:
       Does static magic method have significant benefit over non-static magic one? We shouldn't create the UDF object per row, so I think the cost of non-static magic method is not very different than static one?
   
   The benchmark also doesn't show much difference.
   
   Three different entry points to the UDF API look a bit verbose.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org