You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/17 07:47:35 UTC

[GitHub] [spark] beliefer opened a new pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

beliefer opened a new pull request #34303:
URL: https://github.com/apache/spark/pull/34303


   ### What changes were proposed in this pull request?
   Spark SQL not supports to create function of `Aggregator` yet and deprecated `UserDefinedAggregateFunction`.
   If we want remove `UserDefinedAggregateFunction`, Spark SQL should provide a new option.
   
   
   ### Why are the changes needed?
   We need to provide a new way to create user defined aggregate function so as remove `UserDefinedAggregateFunction` in future.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Users will create user defined aggregate function by implement `Aggregator`.
   
   
   ### How was this patch tested?
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-946401262


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48867/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer closed pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
beliefer closed pull request #34303:
URL: https://github.com/apache/spark/pull/34303


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945357009


   **[Test build #144346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144346/testReport)** for PR 34303 at commit [`7b11c6e`](https://github.com/apache/spark/commit/7b11c6e67e4dac72b35c08647855de1da01b0490).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945407476


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48824/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945721470


   **[Test build #144366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144366/testReport)** for PR 34303 at commit [`136b8f5`](https://github.com/apache/spark/commit/136b8f50e699ea026144c434dda1e9cffe90ae64).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945722718


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144366/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945088086


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945590134


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144365/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945590134


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144365/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-946382163


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48867/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945088086


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945415888


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144346/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-946408459






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34303:
URL: https://github.com/apache/spark/pull/34303#discussion_r730509454



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite.scala
##########
@@ -179,4 +181,24 @@ class HiveSQLViewSuite extends SQLViewSuite with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-37018: Spark SQL should support create function with Aggregator") {
+    val avgFuncClass = "org.apache.spark.sql.hive.execution.MyDoubleAverage"
+    val functionName = "test_udf"
+    withTempDatabase { dbName =>
+      withUserDefinedFunction(
+        s"default.$functionName" -> false,
+        s"$dbName.$functionName" -> false,
+        functionName -> true) {
+        // create a function in default database
+        sql("USE DEFAULT")
+        sql(s"CREATE FUNCTION $functionName AS '$avgFuncClass'")
+        // create a view using a function in 'default' database
+        withView("v1") {
+          sql(s"CREATE VIEW v1 AS SELECT $functionName(col1) AS func FROM VALUES (1), (2), (3)")
+          checkAnswer(sql(s"SELECT * FROM v1"), Seq(Row(102.0)))

Review comment:
       nit
   ```suggestion
             checkAnswer(sql("SELECT * FROM v1"), Seq(Row(102.0)))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945373601


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48824/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945357009


   **[Test build #144346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144346/testReport)** for PR 34303 at commit [`7b11c6e`](https://github.com/apache/spark/commit/7b11c6e67e4dac72b35c08647855de1da01b0490).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34303:
URL: https://github.com/apache/spark/pull/34303#discussion_r730719434



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
##########
@@ -119,10 +124,58 @@ private[sql] class HiveSessionCatalog(
       try {
         super.makeFunctionExpression(name, clazz, input)
       } catch {
-        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we construct
-        // Hive UDF/UDAF/UDTF with function definition. Otherwise, we just throw it earlier.
+        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we try to construct
+        // ScalaAggregator or Hive UDF/UDAF/UDTF with function definition. Otherwise,
+        // we just throw it earlier.
+        // Unfortunately we need to use reflection here because Aggregator
+        // and ScalaAggregator are defined in sql/core module.

Review comment:
       or you want to move the code to `SessionCatalog`? then reflection makes sense




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34303:
URL: https://github.com/apache/spark/pull/34303#discussion_r731472758



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
##########
@@ -119,10 +125,52 @@ private[sql] class HiveSessionCatalog(
       try {
         super.makeFunctionExpression(name, clazz, input)
       } catch {
-        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we construct
-        // Hive UDF/UDAF/UDTF with function definition. Otherwise, we just throw it earlier.
+        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we try to construct
+        // ScalaAggregator or Hive UDF/UDAF/UDTF with function definition. Otherwise,
+        // we just throw it earlier.
         case _: InvalidUDFClassException =>
-          makeHiveFunctionExpression(name, clazz, input)
+          val clsForAggregator = classOf[Aggregator[_, _, _]]
+          if (clsForAggregator.isAssignableFrom(clazz)) {
+            val clsForEncoder = classOf[ExpressionEncoder[_]]
+            val aggregator = clazz.getConstructor().newInstance().asInstanceOf[Aggregator[_, _, _]]
+            // Construct the input encoder
+            val mirror = runtimeMirror(clazz.getClassLoader)
+            val classType = mirror.classSymbol(clazz)
+            val baseClassType = typeOf[Aggregator[_, _, _]].typeSymbol.asClass
+            val baseType = internal.thisType(classType).baseType(baseClassType)
+            val tpe = baseType.typeArgs.head
+            val cls = mirror.runtimeClass(tpe)

Review comment:
       The code references https://github.com/apache/spark/blob/4072a22aa2bf15e95d3043f937a3468057f4fd36/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala#L55




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945632911


   **[Test build #144366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144366/testReport)** for PR 34303 at commit [`136b8f5`](https://github.com/apache/spark/commit/136b8f50e699ea026144c434dda1e9cffe90ae64).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945343903


   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945717400


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48841/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34303:
URL: https://github.com/apache/spark/pull/34303#discussion_r730973287



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
##########
@@ -119,10 +125,52 @@ private[sql] class HiveSessionCatalog(
       try {
         super.makeFunctionExpression(name, clazz, input)
       } catch {
-        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we construct
-        // Hive UDF/UDAF/UDTF with function definition. Otherwise, we just throw it earlier.
+        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we try to construct
+        // ScalaAggregator or Hive UDF/UDAF/UDTF with function definition. Otherwise,
+        // we just throw it earlier.
         case _: InvalidUDFClassException =>
-          makeHiveFunctionExpression(name, clazz, input)
+          val clsForAggregator = classOf[Aggregator[_, _, _]]
+          if (clsForAggregator.isAssignableFrom(clazz)) {
+            val clsForEncoder = classOf[ExpressionEncoder[_]]
+            val aggregator = clazz.getConstructor().newInstance().asInstanceOf[Aggregator[_, _, _]]
+            // Construct the input encoder
+            val mirror = runtimeMirror(clazz.getClassLoader)
+            val classType = mirror.classSymbol(clazz)
+            val baseClassType = typeOf[Aggregator[_, _, _]].typeSymbol.asClass
+            val baseType = internal.thisType(classType).baseType(baseClassType)
+            val tpe = baseType.typeArgs.head
+            val cls = mirror.runtimeClass(tpe)

Review comment:
       did you copy the code above from somewhere?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945415072


   **[Test build #144346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144346/testReport)** for PR 34303 at commit [`7b11c6e`](https://github.com/apache/spark/commit/7b11c6e67e4dac72b35c08647855de1da01b0490).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945704587


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48841/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34303:
URL: https://github.com/apache/spark/pull/34303#discussion_r730718372



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
##########
@@ -119,10 +124,58 @@ private[sql] class HiveSessionCatalog(
       try {
         super.makeFunctionExpression(name, clazz, input)
       } catch {
-        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we construct
-        // Hive UDF/UDAF/UDTF with function definition. Otherwise, we just throw it earlier.
+        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we try to construct
+        // ScalaAggregator or Hive UDF/UDAF/UDTF with function definition. Otherwise,
+        // we just throw it earlier.
+        // Unfortunately we need to use reflection here because Aggregator
+        // and ScalaAggregator are defined in sql/core module.

Review comment:
       classes in sql/core are available in sql/hive. What's the problem you hit?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945590083


   **[Test build #144365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144365/testReport)** for PR 34303 at commit [`136b8f5`](https://github.com/apache/spark/commit/136b8f50e699ea026144c434dda1e9cffe90ae64).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945584651


   **[Test build #144365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144365/testReport)** for PR 34303 at commit [`136b8f5`](https://github.com/apache/spark/commit/136b8f50e699ea026144c434dda1e9cffe90ae64).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945673916


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48840/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945661442


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48840/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945717400


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48841/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945722718


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144366/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-946408461






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945072710


   **[Test build #144330 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144330/testReport)** for PR 34303 at commit [`3702b9b`](https://github.com/apache/spark/commit/3702b9b1ac974810fff4491d2fb0130712578306).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945632911


   **[Test build #144366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144366/testReport)** for PR 34303 at commit [`136b8f5`](https://github.com/apache/spark/commit/136b8f50e699ea026144c434dda1e9cffe90ae64).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34303:
URL: https://github.com/apache/spark/pull/34303#discussion_r731472047



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
##########
@@ -119,10 +125,52 @@ private[sql] class HiveSessionCatalog(
       try {
         super.makeFunctionExpression(name, clazz, input)
       } catch {
-        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we construct
-        // Hive UDF/UDAF/UDTF with function definition. Otherwise, we just throw it earlier.
+        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we try to construct
+        // ScalaAggregator or Hive UDF/UDAF/UDTF with function definition. Otherwise,
+        // we just throw it earlier.
         case _: InvalidUDFClassException =>
-          makeHiveFunctionExpression(name, clazz, input)
+          val clsForAggregator = classOf[Aggregator[_, _, _]]
+          if (clsForAggregator.isAssignableFrom(clazz)) {
+            val clsForEncoder = classOf[ExpressionEncoder[_]]
+            val aggregator = clazz.getConstructor().newInstance().asInstanceOf[Aggregator[_, _, _]]
+            // Construct the input encoder
+            val mirror = runtimeMirror(clazz.getClassLoader)
+            val classType = mirror.classSymbol(clazz)
+            val baseClassType = typeOf[Aggregator[_, _, _]].typeSymbol.asClass
+            val baseType = internal.thisType(classType).baseType(baseClassType)
+            val tpe = baseType.typeArgs.head
+            val cls = mirror.runtimeClass(tpe)
+            val serializer = ScalaReflection.serializerForType(tpe)
+            val deserializer = ScalaReflection.deserializerForType(tpe)
+            val inputEncoder = new ExpressionEncoder(
+              serializer,
+              deserializer,
+              ClassTag(cls))
+
+            val e = classOf[ScalaAggregator[_, _, _]].getConstructor(classOf[Seq[Expression]],

Review comment:
       Thanks for your reminder.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945389209


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48824/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945662045


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48841/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-946403266


   **[Test build #144393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144393/testReport)** for PR 34303 at commit [`815c65c`](https://github.com/apache/spark/commit/815c65c21c0a83b74a05ce42bf2e1c75b5bc4f56).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945088714


   **[Test build #144330 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144330/testReport)** for PR 34303 at commit [`3702b9b`](https://github.com/apache/spark/commit/3702b9b1ac974810fff4491d2fb0130712578306).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-946344980


   **[Test build #144393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144393/testReport)** for PR 34303 at commit [`815c65c`](https://github.com/apache/spark/commit/815c65c21c0a83b74a05ce42bf2e1c75b5bc4f56).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-946344980


   **[Test build #144393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144393/testReport)** for PR 34303 at commit [`815c65c`](https://github.com/apache/spark/commit/815c65c21c0a83b74a05ce42bf2e1c75b5bc4f56).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945083668


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34303:
URL: https://github.com/apache/spark/pull/34303#discussion_r730555343



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite.scala
##########
@@ -179,4 +181,24 @@ class HiveSQLViewSuite extends SQLViewSuite with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-37018: Spark SQL should support create function with Aggregator") {
+    val avgFuncClass = "org.apache.spark.sql.hive.execution.MyDoubleAverage"
+    val functionName = "test_udf"
+    withTempDatabase { dbName =>
+      withUserDefinedFunction(
+        s"default.$functionName" -> false,
+        s"$dbName.$functionName" -> false,
+        functionName -> true) {
+        // create a function in default database
+        sql("USE DEFAULT")
+        sql(s"CREATE FUNCTION $functionName AS '$avgFuncClass'")
+        // create a view using a function in 'default' database
+        withView("v1") {
+          sql(s"CREATE VIEW v1 AS SELECT $functionName(col1) AS func FROM VALUES (1), (2), (3)")
+          checkAnswer(sql(s"SELECT * FROM v1"), Seq(Row(102.0)))

Review comment:
       Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34303:
URL: https://github.com/apache/spark/pull/34303#discussion_r730726906



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
##########
@@ -119,10 +124,58 @@ private[sql] class HiveSessionCatalog(
       try {
         super.makeFunctionExpression(name, clazz, input)
       } catch {
-        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we construct
-        // Hive UDF/UDAF/UDTF with function definition. Otherwise, we just throw it earlier.
+        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we try to construct
+        // ScalaAggregator or Hive UDF/UDAF/UDTF with function definition. Otherwise,
+        // we just throw it earlier.
+        // Unfortunately we need to use reflection here because Aggregator
+        // and ScalaAggregator are defined in sql/core module.

Review comment:
       Oh. Thank you for your remind.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945584651


   **[Test build #144365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144365/testReport)** for PR 34303 at commit [`136b8f5`](https://github.com/apache/spark/commit/136b8f50e699ea026144c434dda1e9cffe90ae64).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945619932


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945622249


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48840/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945673916


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48840/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945088908


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144330/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945072710


   **[Test build #144330 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144330/testReport)** for PR 34303 at commit [`3702b9b`](https://github.com/apache/spark/commit/3702b9b1ac974810fff4491d2fb0130712578306).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945088908


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144330/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945078023


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945407476


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48824/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-945415888


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144346/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34303:
URL: https://github.com/apache/spark/pull/34303#discussion_r730971965



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
##########
@@ -119,10 +125,52 @@ private[sql] class HiveSessionCatalog(
       try {
         super.makeFunctionExpression(name, clazz, input)
       } catch {
-        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we construct
-        // Hive UDF/UDAF/UDTF with function definition. Otherwise, we just throw it earlier.
+        // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we try to construct
+        // ScalaAggregator or Hive UDF/UDAF/UDTF with function definition. Otherwise,
+        // we just throw it earlier.
         case _: InvalidUDFClassException =>
-          makeHiveFunctionExpression(name, clazz, input)
+          val clsForAggregator = classOf[Aggregator[_, _, _]]
+          if (clsForAggregator.isAssignableFrom(clazz)) {
+            val clsForEncoder = classOf[ExpressionEncoder[_]]
+            val aggregator = clazz.getConstructor().newInstance().asInstanceOf[Aggregator[_, _, _]]
+            // Construct the input encoder
+            val mirror = runtimeMirror(clazz.getClassLoader)
+            val classType = mirror.classSymbol(clazz)
+            val baseClassType = typeOf[Aggregator[_, _, _]].typeSymbol.asClass
+            val baseType = internal.thisType(classType).baseType(baseClassType)
+            val tpe = baseType.typeArgs.head
+            val cls = mirror.runtimeClass(tpe)
+            val serializer = ScalaReflection.serializerForType(tpe)
+            val deserializer = ScalaReflection.deserializerForType(tpe)
+            val inputEncoder = new ExpressionEncoder(
+              serializer,
+              deserializer,
+              ClassTag(cls))
+
+            val e = classOf[ScalaAggregator[_, _, _]].getConstructor(classOf[Seq[Expression]],

Review comment:
       can't we just `new ScalaAggregator`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34303:
URL: https://github.com/apache/spark/pull/34303#discussion_r730975443



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite.scala
##########
@@ -179,4 +181,24 @@ class HiveSQLViewSuite extends SQLViewSuite with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-37018: Spark SQL should support create function with Aggregator") {
+    val avgFuncClass = "org.apache.spark.sql.hive.execution.MyDoubleAverage"
+    val functionName = "test_udf"
+    withTempDatabase { dbName =>
+      withUserDefinedFunction(
+        s"default.$functionName" -> false,
+        s"$dbName.$functionName" -> false,
+        functionName -> true) {
+        // create a function in default database
+        sql("USE DEFAULT")
+        sql(s"CREATE FUNCTION $functionName AS '$avgFuncClass'")

Review comment:
       can we do some basic test to make sure the function can be called? and with compatible input types to test implicit cast, incompatible input types to make sure the type check works.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #34303: [SPARK-37018][SQL] Spark SQL should support create function with Aggregator

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #34303:
URL: https://github.com/apache/spark/pull/34303#issuecomment-948364624


   Because https://github.com/apache/spark/pull/34340 reactor the architecture of register user-defined function, I opened https://github.com/apache/spark/pull/34352 replaces this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org