You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/07 00:08:55 UTC

[GitHub] [spark] huaxingao commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

huaxingao commented on code in PR #36773:
URL: https://github.com/apache/spark/pull/36773#discussion_r890658489


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -72,6 +72,26 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 2)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"CORR($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_INTERCEPT" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_INTERCEPT($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_R2" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_R2($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SLOPE" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_SLOPE($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SXX" =>

Review Comment:
   Do we ever hit this path? In `translateAggregate` you don't have `REGR_SXX` and in the PR description it is said that "...... REGR_SXX and REGR_SXY are replaced to other expression in runtime"? Actually `REGR_SXY` is not converted to other expression in runtime, right?



##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala:
##########
@@ -72,6 +72,26 @@ private[sql] object H2Dialect extends JdbcDialect {
           assert(f.children().length == 2)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"CORR($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_INTERCEPT" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_INTERCEPT($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_R2" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_R2($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SLOPE" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_SLOPE($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SXX" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"REGR_SXX($distinct${f.children().head}, ${f.children().last})")
+        case f: GeneralAggregateFunc if f.name() == "REGR_SXY" =>
+          assert(f.children().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""

Review Comment:
   Can we test `DISTINCT` too?



##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala:
##########
@@ -1111,6 +1111,28 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
     checkAnswer(df, Seq(Row(1d), Row(1d), Row(null)))
   }
 
+  test("scan with aggregate push-down: linear regression functions with filter and group by") {
+    val df = sql(
+      """
+        |SELECT
+        |  REGR_INTERCEPT(bonus, bonus),
+        |  REGR_R2(bonus, bonus),
+        |  REGR_SLOPE(bonus, bonus),
+        |  REGR_SXY(bonus, bonus)
+        |FROM h2.test.employee where dept > 0 group by DePt""".stripMargin)

Review Comment:
   nit: capitalize the sql keywords `where` and `group by`?
   
   I just noticed that not all the sql keywords in this test suite are capitalized. Probably open a separate PR to fix them all?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org