You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/06 06:31:56 UTC

[GitHub] [spark] beliefer opened a new pull request, #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

beliefer opened a new pull request, #36773:
URL: https://github.com/apache/spark/pull/36773

   ### What changes were proposed in this pull request?
   Spark supports a lot of linear regression aggregate functions now.
   Because `REGR_AVGX`, `REGR_AVGY`, `REGR_COUNT`, `REGR_SXX` and `REGR_SXY` are replaced to other expression in runtime, This PR will only translate `REGR_INTERCEPT`, `REGR_R2`, `REGR_SLOPE`, `REGR_SXY` for pushdown.
   
   
   ### Why are the changes needed?
   Make the implement of *Dialect could compile `REGR_INTERCEPT`, `REGR_R2`, `REGR_SLOPE`, `REGR_SXY`.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   New feature.
   
   
   ### How was this patch tested?
   New tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r902357088


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -60,18 +60,27 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"STDDEV_SAMP($distinct${f.children().head})")
-        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && f.isDistinct == false =>

Review Comment:
   ```suggestion
           case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct =>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36773:
URL: https://github.com/apache/spark/pull/36773#issuecomment-1178484804

   @cloud-fan @huaxingao Thank you for you review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r902356266


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala:
##########
@@ -750,6 +750,22 @@ object DataSourceStrategy
         PushableColumnWithoutNestedColumn(right), _) =>
           Some(new GeneralAggregateFunc("CORR", agg.isDistinct,
             Array(FieldReference.column(left), FieldReference.column(right))))
+        case aggregate.RegrIntercept(PushableColumnWithoutNestedColumn(left),
+        PushableColumnWithoutNestedColumn(right)) =>
+          Some(new GeneralAggregateFunc("REGR_INTERCEPT", agg.isDistinct,
+            Array(FieldReference.column(left), FieldReference.column(right))))
+        case aggregate.RegrR2(PushableColumnWithoutNestedColumn(left),

Review Comment:
   not only this PR, but why can't we use `PushableExpression`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36773:
URL: https://github.com/apache/spark/pull/36773#issuecomment-1147223345

   ping @huaxingao cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r916608215


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala:
##########
@@ -1685,6 +1709,42 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
     checkAnswer(df2, Seq(Row(1d), Row(1d), Row(null)))
   }
 
+  test("scan with aggregate push-down: linear regression functions with filter and group by") {

Review Comment:
   @beliefer mind taking a look please?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
srowen commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r890245878


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala:
##########
@@ -750,6 +750,22 @@ object DataSourceStrategy
         PushableColumnWithoutNestedColumn(right), _) =>
           Some(new GeneralAggregateFunc("CORR", agg.isDistinct,
             Array(FieldReference.column(left), FieldReference.column(right))))
+        case aggregate.RegrIntercept(PushableColumnWithoutNestedColumn(left),

Review Comment:
   Not required, but it'd be nice to find a way to fix the indents here - should not line up with 'case' in the continuation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36773:
URL: https://github.com/apache/spark/pull/36773#issuecomment-1168614462

   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r916607535


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala:
##########
@@ -1685,6 +1709,42 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
     checkAnswer(df2, Seq(Row(1d), Row(1d), Row(null)))
   }
 
+  test("scan with aggregate push-down: linear regression functions with filter and group by") {

Review Comment:
   This also seems to be failing with ANSI mode on:
   
   ```
   2022-07-08T01:56:48.3914077Z [info] - scan with aggregate push-down: linear regression functions with filter and group by *** FAILED *** (350 milliseconds)
   2022-07-08T01:56:48.3915454Z [info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 416.0 failed 1 times, most recent failure: Lost task 0.0 in stage 416.0 (TID 379) (localhost executor driver): org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" (except for ANSI interval type) to bypass this error.
   2022-07-08T01:56:48.3916839Z [info] 	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:184)
   2022-07-08T01:56:48.3917602Z [info] 	at org.apache.spark.sql.errors.QueryExecutionErrors.divideByZeroError(QueryExecutionErrors.scala)
   2022-07-08T01:56:48.3918487Z [info] 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.hashAgg_doAggregateWithKeysOutput_0$(Unknown Source)
   2022-07-08T01:56:48.3919355Z [info] 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
   2022-07-08T01:56:48.3920199Z [info] 	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
   2022-07-08T01:56:48.3921079Z [info] 	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
   2022-07-08T01:56:48.3921693Z [info] 	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
   2022-07-08T01:56:48.3922208Z [info] 	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
   2022-07-08T01:56:48.3922715Z [info] 	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1917)
   2022-07-08T01:56:48.3923231Z [info] 	at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1268)
   2022-07-08T01:56:48.3923746Z [info] 	at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1268)
   2022-07-08T01:56:48.3924290Z [info] 	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2262)
   2022-07-08T01:56:48.3924862Z [info] 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
   2022-07-08T01:56:48.3925414Z [info] 	at org.apache.spark.scheduler.Task.run(Task.scala:139)
   2022-07-08T01:56:48.3925982Z [info] 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
   2022-07-08T01:56:48.3926545Z [info] 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490)
   2022-07-08T01:56:48.3927090Z [info] 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
   2022-07-08T01:56:48.3927701Z [info] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   2022-07-08T01:56:48.3928327Z [info] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   2022-07-08T01:56:48.3928833Z [info] 	at java.lang.Thread.run(Thread.java:750)
   2022-07-08T01:56:48.3929196Z [info] 
   2022-07-08T01:56:48.3929548Z [info] Driver stacktrace:
   2022-07-08T01:56:48.3930152Z [info]   at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2706)
   2022-07-08T01:56:48.3930839Z [info]   at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2642)
   2022-07-08T01:56:48.3931492Z [info]   at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2641)
   2022-07-08T01:56:48.3932111Z [info]   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   2022-07-08T01:56:48.3932717Z [info]   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
   2022-07-08T01:56:48.3933306Z [info]   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
   2022-07-08T01:56:48.3933969Z [info]   at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2641)
   2022-07-08T01:56:48.3937642Z [info]   at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1189)
   2022-07-08T01:56:48.3941965Z [info]   at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1189)
   2022-07-08T01:56:48.3943119Z [info]   at scala.Option.foreach(Option.scala:407)
   2022-07-08T01:56:48.3943813Z [info]   at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1189)
   2022-07-08T01:56:48.3944644Z [info]   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2897)
   2022-07-08T01:56:48.3945466Z [info]   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2836)
   2022-07-08T01:56:48.3946265Z [info]   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2825)
   2022-07-08T01:56:48.3947177Z [info]   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   2022-07-08T01:56:48.3947776Z [info]   at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
   2022-07-08T01:56:48.4330080Z [info]   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2222)
   2022-07-08T01:56:48.4330873Z [info]   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2243)
   2022-07-08T01:56:48.4331499Z [info]   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2262)
   2022-07-08T01:56:48.4332552Z [info]   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2287)
   2022-07-08T01:56:48.4333184Z [info]   at org.apache.spark.rdd.RDD.count(RDD.scala:1268)
   2022-07-08T01:56:48.4333923Z [info]   at org.apache.spark.sql.QueryTest$.$anonfun$getErrorMessageInCheckAnswer$1(QueryTest.scala:265)
   2022-07-08T01:56:48.4334537Z [info]   at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
   2022-07-08T01:56:48.4335504Z [info]   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:171)
   2022-07-08T01:56:48.4336158Z [info]   at org.apache.spark.sql.QueryTest$.getErrorMessageInCheckAnswer(QueryTest.scala:265)
   2022-07-08T01:56:48.4336802Z [info]   at org.apache.spark.sql.QueryTest$.checkAnswer(QueryTest.scala:242)
   2022-07-08T01:56:48.4337377Z [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:150)
   2022-07-08T01:56:48.4337959Z [info]   at org.apache.spark.sql.jdbc.JDBCV2Suite.$anonfun$new$204(JDBCV2Suite.scala:1745)
   2022-07-08T01:56:48.4338601Z [info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   2022-07-08T01:56:48.4339126Z [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
   2022-07-08T01:56:48.4339645Z [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
   2022-07-08T01:56:48.4340150Z [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
   2022-07-08T01:56:48.4340653Z [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
   2022-07-08T01:56:48.4341312Z [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
   2022-07-08T01:56:48.4369891Z [info]   at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
   2022-07-08T01:56:48.4370698Z [info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:204)
   2022-07-08T01:56:48.4371447Z [info]   at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
   2022-07-08T01:56:48.4372349Z [info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
   2022-07-08T01:56:48.4373008Z [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
   2022-07-08T01:56:48.4373668Z [info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
   2022-07-08T01:56:48.4374361Z [info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
   2022-07-08T01:56:48.4375096Z [info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:65)
   2022-07-08T01:56:48.4375800Z [info]   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
   2022-07-08T01:56:48.4376475Z [info]   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
   2022-07-08T01:56:48.4377124Z [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:65)
   2022-07-08T01:56:48.4377802Z [info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
   2022-07-08T01:56:48.4378466Z [info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
   2022-07-08T01:56:48.4379077Z [info]   at scala.collection.immutable.List.foreach(List.scala:431)
   2022-07-08T01:56:48.4379698Z [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
   2022-07-08T01:56:48.4380329Z [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
   2022-07-08T01:56:48.4380955Z [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
   2022-07-08T01:56:48.4381607Z [info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
   2022-07-08T01:56:48.4382292Z [info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
   2022-07-08T01:56:48.4382973Z [info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
   2022-07-08T01:56:48.4383559Z [info]   at org.scalatest.Suite.run(Suite.scala:1112)
   2022-07-08T01:56:48.4384087Z [info]   at org.scalatest.Suite.run$(Suite.scala:1094)
   2022-07-08T01:56:48.4384760Z [info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
   2022-07-08T01:56:48.4385468Z [info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
   2022-07-08T01:56:48.4386094Z [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
   2022-07-08T01:56:48.4386725Z [info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
   2022-07-08T01:56:48.4387381Z [info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
   2022-07-08T01:56:48.4388169Z [info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:65)
   2022-07-08T01:56:48.4388882Z [info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
   2022-07-08T01:56:48.4389533Z [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
   2022-07-08T01:56:48.4390219Z [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
   2022-07-08T01:56:48.4390838Z [info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:65)
   2022-07-08T01:56:48.4391511Z [info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318)
   2022-07-08T01:56:48.4392296Z [info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513)
   2022-07-08T01:56:48.4433669Z [info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
   2022-07-08T01:56:48.4434367Z [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   2022-07-08T01:56:48.4435084Z [info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   2022-07-08T01:56:48.4435808Z [info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   2022-07-08T01:56:48.4436787Z [info]   at java.lang.Thread.run(Thread.java:750)
   2022-07-08T01:56:48.4437780Z [info]   Cause: org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" (except for ANSI interval type) to bypass this error.
   2022-07-08T01:56:48.4438954Z [info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:184)
   2022-07-08T01:56:48.4439793Z [info]   at org.apache.spark.sql.errors.QueryExecutionErrors.divideByZeroError(QueryExecutionErrors.scala)
   2022-07-08T01:56:48.4441245Z [info]   at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.hashAgg_doAggregateWithKeysOutput_0$(Unknown Source)
   2022-07-08T01:56:48.4442235Z [info]   at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
   2022-07-08T01:56:48.4443261Z [info]   at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
   2022-07-08T01:56:48.4444051Z [info]   at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
   2022-07-08T01:56:48.4444879Z [info]   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
   2022-07-08T01:56:48.4445474Z [info]   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
   2022-07-08T01:56:48.4446193Z [info]   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1917)
   2022-07-08T01:56:48.4447254Z [info]   at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1268)
   2022-07-08T01:56:48.4448107Z [info]   at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1268)
   2022-07-08T01:56:48.4449171Z [info]   at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2262)
   2022-07-08T01:56:48.4449891Z [info]   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
   2022-07-08T01:56:48.4450452Z [info]   at org.apache.spark.scheduler.Task.run(Task.scala:139)
   2022-07-08T01:56:48.4451018Z [info]   at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
   2022-07-08T01:56:48.4451581Z [info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490)
   2022-07-08T01:56:48.4452202Z [info]   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
   2022-07-08T01:56:48.4452818Z [info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   2022-07-08T01:56:48.4453447Z [info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   2022-07-08T01:56:48.4453968Z [info]   at java.lang.Thread.run(Thread.java:750)
   ```
   
   https://github.com/apache/spark/runs/7244240118?check_suite_focus=true



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r890714758


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -72,6 +72,26 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 2)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"CORR($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_INTERCEPT" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_INTERCEPT($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_R2" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_R2($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SLOPE" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_SLOPE($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SXX" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_SXX($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SXY" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r913490082


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -62,18 +62,27 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"STDDEV_SAMP($distinct${f.children().head})")
-        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct =>
           assert(f.children().length == 2)
-          val distinct = if (f.isDistinct) "DISTINCT " else ""

Review Comment:
   so this is a bug fix?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r913565329


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -62,18 +62,27 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"STDDEV_SAMP($distinct${f.children().head})")
-        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct =>
           assert(f.children().length == 2)
-          val distinct = if (f.isDistinct) "DISTINCT " else ""

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r913478436


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -62,18 +62,27 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"STDDEV_SAMP($distinct${f.children().head})")
-        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct =>
           assert(f.children().length == 2)
-          val distinct = if (f.isDistinct) "DISTINCT " else ""

Review Comment:
   No. Just H2 does not supports `COVAR_POP` with distinct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r913497712


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -62,18 +62,27 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"STDDEV_SAMP($distinct${f.children().head})")
-        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct =>
           assert(f.children().length == 2)
-          val distinct = if (f.isDistinct) "DISTINCT " else ""

Review Comment:
   Can we have a separate PR for this? I think previously the query fails if we push down distinct COVAR_POP to H2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
huaxingao commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r890658489


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -72,6 +72,26 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 2)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"CORR($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_INTERCEPT" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_INTERCEPT($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_R2" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_R2($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SLOPE" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_SLOPE($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SXX" =>

Review Comment:
   Do we ever hit this path? In `translateAggregate` you don't have `REGR_SXX` and in the PR description it is said that "...... REGR_SXX and REGR_SXY are replaced to other expression in runtime"? Actually `REGR_SXY` is not converted to other expression in runtime, right?



##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -72,6 +72,26 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 2)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"CORR($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_INTERCEPT" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_INTERCEPT($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_R2" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_R2($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SLOPE" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_SLOPE($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SXX" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_SXX($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SXY" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""

Review Comment:
   Can we test `DISTINCT` too?



##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala:
##########
@@ -1111,6 +1111,28 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
     checkAnswer(df, Seq(Row(1d), Row(1d), Row(null)))
   }
 
+  test("scan with aggregate push-down: linear regression functions with filter and group by") {
+    val df = sql(
+      """
+        |SELECT
+        |  REGR_INTERCEPT(bonus, bonus),
+        |  REGR_R2(bonus, bonus),
+        |  REGR_SLOPE(bonus, bonus),
+        |  REGR_SXY(bonus, bonus)
+        |FROM h2.test.employee where dept > 0 group by DePt""".stripMargin)

Review Comment:
   nit: capitalize the sql keywords `where` and `group by`?
   
   I just noticed that not all the sql keywords in this test suite are capitalized. Probably open a separate PR to fix them all?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r890715857


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala:
##########
@@ -750,6 +750,22 @@ object DataSourceStrategy
         PushableColumnWithoutNestedColumn(right), _) =>
           Some(new GeneralAggregateFunc("CORR", agg.isDistinct,
             Array(FieldReference.column(left), FieldReference.column(right))))
+        case aggregate.RegrIntercept(PushableColumnWithoutNestedColumn(left),

Review Comment:
   two indents ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r916752602


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala:
##########
@@ -1685,6 +1709,42 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
     checkAnswer(df2, Seq(Row(1d), Row(1d), Row(null)))
   }
 
+  test("scan with aggregate push-down: linear regression functions with filter and group by") {

Review Comment:
   Made a PR: https://github.com/apache/spark/pull/37137



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown
URL: https://github.com/apache/spark/pull/36773


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r890696454


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala:
##########
@@ -1111,6 +1111,28 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
     checkAnswer(df, Seq(Row(1d), Row(1d), Row(null)))
   }
 
+  test("scan with aggregate push-down: linear regression functions with filter and group by") {
+    val df = sql(
+      """
+        |SELECT
+        |  REGR_INTERCEPT(bonus, bonus),
+        |  REGR_R2(bonus, bonus),
+        |  REGR_SLOPE(bonus, bonus),
+        |  REGR_SXY(bonus, bonus)
+        |FROM h2.test.employee where dept > 0 group by DePt""".stripMargin)

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r890694554


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -72,6 +72,26 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 2)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"CORR($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_INTERCEPT" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_INTERCEPT($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_R2" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_R2($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SLOPE" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_SLOPE($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SXX" =>

Review Comment:
   Thank you. I forgot to remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #36773:
URL: https://github.com/apache/spark/pull/36773#issuecomment-1177762210

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r916569488


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala:
##########
@@ -1685,6 +1709,42 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
     checkAnswer(df2, Seq(Row(1d), Row(1d), Row(null)))
   }
 
+  test("scan with aggregate push-down: linear regression functions with filter and group by") {

Review Comment:
   Hm, this test seems flaky:
   
   ```
   org.scalatest.exceptions.TestFailedException: "== Parsed Logical Plan ==
   'Aggregate ['DePt], [unresolvedalias('REGR_INTERCEPT('bonus, 'bonus), None), unresolvedalias('REGR_R2('bonus, 'bonus), None), unresolvedalias('REGR_SLOPE('bonus, 'bonus), None), unresolvedalias('REGR_SXY('bonus, 'bonus), None)]
   +- 'Filter ('dept > 0)
      +- 'UnresolvedRelation [h2, test, employee], [], false
   
   == Analyzed Logical Plan ==
   regr_intercept(bonus, bonus): double, regr_r2(bonus, bonus): double, regr_slope(bonus, bonus): double, regr_sxy(bonus, bonus): double
   Aggregate [DePt#x], [regr_intercept(bonus#x, bonus#x) AS regr_intercept(bonus, bonus)#x, regr_r2(bonus#x, bonus#x) AS regr_r2(bonus, bonus)#x, regr_slope(bonus#x, bonus#x) AS regr_slope(bonus, bonus)#x, regr_sxy(bonus#x, bonus#x) AS regr_sxy(bonus, bonus)#x]
   +- Filter (dept#x > 0)
      +- SubqueryAlias h2.test.employee
         +- RelationV2[DEPT#x, NAME#x, SALARY#x, BONUS#x, IS_MANAGER#x] h2.test.employee test.employee
   
   == Optimized Logical Plan ==
   Project [REGR_INTERCEPT(BONUS, BONUS)#x AS regr_intercept(bonus#x, bonus#x)#x AS regr_intercept(bonus, bonus)#x, REGR_R2(BONUS, BONUS)#x AS regr_r2(bonus#x, bonus#x)#x AS regr_r2(bonus, bonus)#x, REGR_SLOPE(BONUS, BONUS)#x AS regr_slope(bonus#x, bonus#x)#x AS regr_slope(bonus, bonus)#x, REGR_SXY(BONUS, BONUS)#x AS regr_sxy(bonus#x, bonus#x)#x AS regr_sxy(bonus, bonus)#x]
   +- RelationV2[DEPT#x, REGR_INTERCEPT(BONUS, BONUS)#x, REGR_R2(BONUS, BONUS)#x, REGR_SLOPE(BONUS, BONUS)#x, REGR_SXY(BONUS, BONUS)#x] test.employee
   
   == Physical Plan ==
   *(1) Project [REGR_INTERCEPT(BONUS, BONUS)#x AS regr_intercept(bonus#x, bonus#x)#x AS regr_intercept(bonus, bonus)#x, REGR_R2(BONUS, BONUS)#x AS regr_r2(bonus#x, bonus#x)#x AS regr_r2(bonus, bonus)#x, REGR_SLOPE(BONUS, BONUS)#x AS regr_slope(bonus#x, bonus#x)#x AS regr_slope(bonus, bonus)#x, REGR_SXY(BONUS, BONUS)#x AS regr_sxy(bonus#x, bonus#x)#x AS regr_sxy(bonus, bonus)#x]
   +- *(1) Scan org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$1@6cbb46ad [DEPT#x,REGR_INTERCEPT(BONUS, BONUS)#x,REGR_R2(BONUS, BONUS)#x,REGR_SLOPE(BONUS, BONUS)#x,REGR_SXY(BONUS, BONUS)#x] PushedAggregates: [REGR_INTERCEPT(BONUS, BONUS), REGR_R2(BONUS, BONUS), REGR_SLOPE(BONUS, BONUS), REGR_SXY(BONUS, BONUS)], PushedFilters: [DEPT IS NOT NULL, DEPT > 0], PushedGroupByExpressions: [DEPT], ReadSchema: struct<DEPT:int,REGR_INTERCEPT(BONUS, BONUS):double,REGR_R2(BONUS, BONUS):double,REGR_SLOPE(BONUS, BONUS):double,REGR_SXY(BONUS, BONUS):double>
   
   " did not contain " PushedAggregates: [REGR_INTERCEPT(BONUS, BONUS), REGR_R2(BONUS, BONUS), REGR_SLOPE(BONUS, BONUS), REGR_SXY(BONUS, B..., PushedFilters: [DEPT IS NOT NULL, DEPT > 0], PushedGroupByExpressions: [DEPT], "
   ```
   
   https://github.com/apache/spark/runs/7247205123



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r913493163


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -62,18 +62,27 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"STDDEV_SAMP($distinct${f.children().head})")
-        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct =>
           assert(f.children().length == 2)
-          val distinct = if (f.isDistinct) "DISTINCT " else ""

Review Comment:
   Yes. remove no need code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r913478436


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -62,18 +62,27 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"STDDEV_SAMP($distinct${f.children().head})")
-        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct =>
           assert(f.children().length == 2)
-          val distinct = if (f.isDistinct) "DISTINCT " else ""

Review Comment:
   No. Just H2 not supports `COVAR_POP` with distinct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r913497712


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -62,18 +62,27 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"STDDEV_SAMP($distinct${f.children().head})")
-        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct =>
           assert(f.children().length == 2)
-          val distinct = if (f.isDistinct) "DISTINCT " else ""

Review Comment:
   Can we have a separate PR for this? I think previously the query fails if the push down distinct COVAR_POP to H2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r913426052


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -62,18 +62,27 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"STDDEV_SAMP($distinct${f.children().head})")
-        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct =>
           assert(f.children().length == 2)
-          val distinct = if (f.isDistinct) "DISTINCT " else ""

Review Comment:
   does it mean all distinct functions are not supported now?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r916719849


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala:
##########
@@ -1685,6 +1709,42 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
     checkAnswer(df2, Seq(Row(1d), Row(1d), Row(null)))
   }
 
+  test("scan with aggregate push-down: linear regression functions with filter and group by") {

Review Comment:
   Looks like this is a problem from `Covariance` (from `RegrIntercept.covarPop`) which uses `/` that throws an exception when ANSI mode is on.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r916571989


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala:
##########
@@ -1685,6 +1709,42 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
     checkAnswer(df2, Seq(Row(1d), Row(1d), Row(null)))
   }
 
+  test("scan with aggregate push-down: linear regression functions with filter and group by") {

Review Comment:
   it's due to a logical conflict between 2 PR merged at the same time. It has been fixed now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org