You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/19 09:02:58 UTC

[GitHub] [spark] beliefer opened a new pull request, #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

beliefer opened a new pull request, #36608:
URL: https://github.com/apache/spark/pull/36608

   ### What changes were proposed in this pull request?
   `REGR_SLOPE` is an ANSI aggregate functions
   
   **Syntax**: REGR_SLOPE(y, x)
   **Arguments**: 
   - **y**:The dependent variable. This must be an expression that can be evaluated to a numeric type.
   - **x**:The independent variable. This must be an expression that can be evaluated to a numeric type.
   
   **Examples**:
   `select k, regr_slope(v, v2) from aggr group by k;`
   
   |  k |    regr_slope(v, v2) |
   |---|--------------------|
   | 1  |          [NULL]            |
   | 2 |     0.831408776       |
   
   The algorithm refers https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
   
   The mainstream database supports `regr_count` show below:
   **Teradata**
   https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/I0~kqsq3f3uNmjUaZr8hDg
   **Snowflake**
   https://docs.snowflake.com/en/sql-reference/functions/regr_slope.html
   **Oracle**
   https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
   **DB2**
   https://www.ibm.com/docs/en/db2/11.5?topic=af-regression-functions-regr-avgx-regr-avgy-regr-count
   **H2**
   http://www.h2database.com/html/functions-aggregate.html#regr_slope
   **Postgresql**
   https://www.postgresql.org/docs/8.4/functions-aggregate.html
   **Sybase**
   https://infocenter.sybase.com/help/topic/com.sybase.help.sqlanywhere.12.0.0/dbreference/regr-slope-function.html
   **Presto**
   https://prestodb.io/docs/current/functions/aggregate.html
   Exasol
   https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/regr_function.htm
   
   ### Why are the changes needed?
   `REGR_SLOPE` is very useful.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'Yes'. New feature.
   
   
   ### How was this patch tested?
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36608:
URL: https://github.com/apache/spark/pull/36608#discussion_r882396518


##########
sql/core/src/test/resources/sql-tests/results/group-by.sql.out:
##########
@@ -1071,6 +1071,39 @@ struct<k:int,regr_syy(y, x):double>
 2	200.0
 
 
+-- !query
+SELECT regr_slope(y, x) FROM testRegression

Review Comment:
   nit: can we have a new golden file for these linear regression functions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36608:
URL: https://github.com/apache/spark/pull/36608#discussion_r877745569


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala:
##########
@@ -69,7 +69,7 @@ abstract class Covariance(val left: Expression, val right: Expression, nullOnDiv
   }
 
   protected def updateExpressionsDef: Seq[Expression] = {
-    val newN = n + 1.0
+    val newN = count + 1.0

Review Comment:
   ```suggestion
       val newCount = count + 1.0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36608:
URL: https://github.com/apache/spark/pull/36608#issuecomment-1132419433

   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36608:
URL: https://github.com/apache/spark/pull/36608#discussion_r877763059


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala:
##########
@@ -34,7 +34,7 @@ abstract class Covariance(val left: Expression, val right: Expression, nullOnDiv
   override def dataType: DataType = DoubleType
   override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType)
 
-  protected val n = AttributeReference("n", DoubleType, nullable = false)()
+  protected val count = AttributeReference("count", DoubleType, nullable = false)()

Review Comment:
   shall we make it `protected[sql]` so that we can access it directly in the new expression?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #36608:
URL: https://github.com/apache/spark/pull/36608#issuecomment-1139588272

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36608:
URL: https://github.com/apache/spark/pull/36608#issuecomment-1140129061

   @cloud-fan Thank you for your help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36608:
URL: https://github.com/apache/spark/pull/36608#discussion_r883219417


##########
sql/core/src/test/resources/sql-tests/inputs/linear-regression.sql:
##########
@@ -0,0 +1,51 @@
+-- Test aggregate operator with codegen on and off.

Review Comment:
   We don't need to do this, as we are testing functions here, not the Aggregate operator.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope
URL: https://github.com/apache/spark/pull/36608


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36608:
URL: https://github.com/apache/spark/pull/36608#discussion_r883213488


##########
sql/core/src/test/resources/sql-tests/results/group-by.sql.out:
##########
@@ -1071,6 +1071,39 @@ struct<k:int,regr_syy(y, x):double>
 2	200.0
 
 
+-- !query
+SELECT regr_slope(y, x) FROM testRegression

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org