You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/28 03:36:15 UTC

[GitHub] [spark] beliefer opened a new pull request, #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept

beliefer opened a new pull request, #36708:
URL: https://github.com/apache/spark/pull/36708

   ### What changes were proposed in this pull request?
   `REGR_INTERCEPT` is an ANSI aggregate functions
   
   **Syntax**: REGR_INTERCEPT(y, x)
   **Arguments**: 
   - **y**:The dependent variable. This must be an expression that can be evaluated to a numeric type.
   - **x**:The independent variable. This must be an expression that can be evaluated to a numeric type.
   
   **Examples**:
   `select k, regr_intercept(v, v2) from aggr group by k;`
   
   |  k |    regr_intercept(v, v2) |
   |---|--------------------|
   | 1  |          [NULL]            |
   | 2 |     1.154734411       |
   
   The algorithm refers https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
   
   The mainstream database supports `regr_intercept` show below:
   **Teradata**
   https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/MpkZBV~MSTZ~I84I~ezxNg
   **Snowflake**
   https://docs.snowflake.com/en/sql-reference/functions/regr_intercept.html
   **Oracle**
   https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
   **DB2**
   https://www.ibm.com/docs/en/db2/11.5?topic=af-regression-functions-regr-avgx-regr-avgy-regr-count
   **H2**
   http://www.h2database.com/html/functions-aggregate.html#regr_intercept
   **Postgresql**
   https://www.postgresql.org/docs/8.4/functions-aggregate.html
   **Sybase**
   https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.0/dbreference/regr-intercept-function.html
   **Presto**
   https://prestodb.io/docs/current/functions/aggregate.html
   Exasol
   https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/regr_function.htm
   
   ### Why are the changes needed?
   `REGR_INTERCEPT` is very useful.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'. New feature.
   
   
   ### How was this patch tested?
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36708:
URL: https://github.com/apache/spark/pull/36708#issuecomment-1140376742

   ping @MaxGekk cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36708:
URL: https://github.com/apache/spark/pull/36708#issuecomment-1146566679

   @MaxGekk @cloud-fan Thank you for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept
URL: https://github.com/apache/spark/pull/36708


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36708:
URL: https://github.com/apache/spark/pull/36708#discussion_r884912975


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/linearRegression.scala:
##########
@@ -291,3 +291,52 @@ case class RegrSlope(left: Expression, right: Expression) extends DeclarativeAgg
       newLeft: Expression, newRight: Expression): RegrSlope =
     copy(left = newLeft, right = newRight)
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(y, x) - Returns the intercept of the univariate linear regression line for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(y, x) FROM VALUES (1,1), (2,2), (3,3) AS tab(y, x);
+       0.0
+      > SELECT _FUNC_(y, x) FROM VALUES (1, null) AS tab(y, x);
+       NULL
+      > SELECT _FUNC_(y, x) FROM VALUES (null, 1) AS tab(y, x);
+       NULL
+  """,
+  group = "agg_funcs",
+  since = "3.4.0")
+// scalastyle:on line.size.limit
+case class RegrIntercept(left: Expression, right: Expression) extends DeclarativeAggregate
+  with ImplicitCastInputTypes with BinaryLike[Expression] {
+
+  private val regrSlope = RegrSlope(left, right)

Review Comment:
   It looks tricky to have one more level of indirection, can we create `CovPopulation` and `VariancePop` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on PR #36708:
URL: https://github.com/apache/spark/pull/36708#issuecomment-1145630828

   +1, LGTM. Merging to master.
   Thank you, @beliefer and @cloud-fan for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org