You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/09 07:34:06 UTC

[GitHub] [spark] leanken opened a new pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

leanken opened a new pull request #29983:
URL: https://github.com/apache/spark/pull/29983


   ### What changes were proposed in this pull request?
   
   As [SPARK-13860](https://issues.apache.org/jira/browse/SPARK-13860) stated, TPCDS Query 39 returns wrong results using SparkSQL. The root cause is that when stddev_samp is applied to a single element set, with TPCDS answer, it return 0.0; as in SparkSQL, it return Double.NaN which caused the wrong result.
   
   Add a extra legacy config to fallback into the NaN logical, and return 0.0 by default to align with TPCDS standard.
   
   ### Why are the changes needed?
   
   SQL correctness issue.
   
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   Updated DataFrameAggregateSuite to test both default and legacy behavior.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706548475






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706023248


   **[Test build #129579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129579/testReport)** for PR 29983 at commit [`1e7894d`](https://github.com/apache/spark/commit/1e7894dec0ad4f926a8a0db78155402840c2e70d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706660844


   **[Test build #129635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129635/testReport)** for PR 29983 at commit [`a7a6eac`](https://github.com/apache/spark/commit/a7a6eac3e05c57d1e295b460e5b26b4f3287080e).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706481861


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34213/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707520956






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706816419


   @cloud-fan 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706905385


   **[Test build #129668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129668/testReport)** for PR 29983 at commit [`16bc5d5`](https://github.com/apache/spark/commit/16bc5d5c4310452fcbd0dea6e7e19e2fcbd10442).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken edited a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken edited a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706833973


   > Seems plausible to me.
   
   ```
   case class CovSample(left: Expression, right: Expression) extends Covariance(left, right) {
     override val evaluateExpression: Expression = {
       If(n === 0.0, Literal.create(null, DoubleType),
         If(n === 1.0, Double.NaN, ck / (n - 1.0)))
     }
   
   case class Corr(x: Expression, y: Expression)
     extends PearsonCorrelation(x, y) {
   
     override val evaluateExpression: Expression = {
       If(n === 0.0, Literal.create(null, DoubleType),
         If(n === 1.0, Double.NaN, ck / sqrt(xMk * yMk)))
     }
   ```
   
   found two more place return Double.NaN with DivideByZero case. should I update these as well??
   and change ConfigName to `spark.sql.legacy.divideByZeroEvaluation` ??
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707521252


   @cloud-fan if no further comment, test was passed, ready to merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707452911


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34323/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r502377343



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2775,6 +2775,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR =
+    buildConf("spark.sql.legacy.centralMomentAgg.enabled")

Review comment:
       Also, could you move this config close to the other legacy configs?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707208116


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34302/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706997976


   **[Test build #129674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129674/testReport)** for PR 29983 at commit [`5095df4`](https://github.com/apache/spark/commit/5095df42051f18a5b752a85dd054db419cebb4f0).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706966330


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34278/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706051707


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34184/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707336710


   **[Test build #129697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129697/testReport)** for PR 29983 at commit [`6eee3c9`](https://github.com/apache/spark/commit/6eee3c9042865b7bf3d268188852928d2cb5cd0c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706645443


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34239/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706073752






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707195321


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34302/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706485872


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34213/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706641024


   **[Test build #129635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129635/testReport)** for PR 29983 at commit [`a7a6eac`](https://github.com/apache/spark/commit/a7a6eac3e05c57d1e295b460e5b26b4f3287080e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707208138






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707213682


   **[Test build #129697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129697/testReport)** for PR 29983 at commit [`6eee3c9`](https://github.com/apache/spark/commit/6eee3c9042865b7bf3d268188852928d2cb5cd0c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707620997






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
leanken commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706464852


   > Thank you for your contribution, @leanken .
   > BTW, could you check the UT failure? It looks like relevant.
   > 
   > ```
   > org.apache.spark.sql.hive.execution.WindowQuerySuite.windowing.q -- 15. testExpressions
   > ```
   
   sure


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707081312






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707060554


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34287/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706645449






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706979047






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706051723






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707245897






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706833295


   **[Test build #129658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129658/testReport)** for PR 29983 at commit [`7292500`](https://github.com/apache/spark/commit/7292500bd14da44c704b00f125fefaca15541cb4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706488197


   **[Test build #129610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129610/testReport)** for PR 29983 at commit [`a853132`](https://github.com/apache/spark/commit/a853132353dbbb5f446fd9181c955de0f9bcc43e).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706997382






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706051723






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707267634


   **[Test build #129696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129696/testReport)** for PR 29983 at commit [`dc8efb6`](https://github.com/apache/spark/commit/dc8efb6cb94fee3a7782498b162f8bd471642348).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706816360


   **[Test build #129651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129651/testReport)** for PR 29983 at commit [`656a0ff`](https://github.com/apache/spark/commit/656a0fffdf49477dc4750764dce18eda73c95a58).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707234745


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34303/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r502364642



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2775,6 +2775,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR =
+    buildConf("spark.sql.legacy.centralMomentAgg.enabled")

Review comment:
       Could you update the migration guide, too?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503094201



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala
##########
@@ -62,6 +63,9 @@ abstract class CentralMomentAgg(child: Expression)
   protected val m3 = AttributeReference("m3", DoubleType, nullable = false)()
   protected val m4 = AttributeReference("m4", DoubleType, nullable = false)()
 
+  protected val divideByZeroEvalResult: Expression =
+    if (SQLConf.get.legacyStatisticalAggregate) Double.NaN else Literal.create(null, DoubleType)

Review comment:
       can we move the flag to constructor parameter? e.g.
   ```
   abstract class CentralMomentAgg(child: Expression, nullOnDivideByZero: Boolean) {
     ...
     protected def divideByZeroEvalResult: Expression = if (nullOnDivideByZero) ...
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706905860


   > > found two more place return Double.NaN with DivideByZero case. should I update these as well??
   > > and change ConfigName to spark.sql.legacy.divideByZeroEvaluation ??
   > 
   > Or, `spark.sql.legacy.centralMomentAndCovarianceAgg`?
   
   change into spark.sql.legacy.statisticalAggregate


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707268117


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on a change in pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503043290



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala
##########
@@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleto
   }
 
   test("windowing.q -- 15. testExpressions") {
-    // Moved because:
-    // - Spark uses a different default stddev (sample instead of pop)
-    // - Tiny numerical differences in stddev results.
-    // - Different StdDev behavior when n=1 (NaN instead of 0)
-    checkAnswer(sql(s"""
-      |select  p_mfgr,p_name, p_size,
-      |rank() over(distribute by p_mfgr sort by p_name) as r,
-      |dense_rank() over(distribute by p_mfgr sort by p_name) as dr,
-      |cume_dist() over(distribute by p_mfgr sort by p_name) as cud,
-      |percent_rank() over(distribute by p_mfgr sort by p_name) as pr,
-      |ntile(3) over(distribute by p_mfgr sort by p_name) as nt,
-      |count(p_size) over(distribute by p_mfgr sort by p_name) as ca,
-      |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg,
-      |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st,
-      |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv,
-      |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv,
-      |first_value(p_size) over w1  as fvW1
-      |from part
-      |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name
-      |             rows between 2 preceding and 2 following)
+    withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") {

Review comment:
       I see your point. the legacy coverage code will only stay in DataFrameWindowFunctionsSuite, other UT will just be updated to expected answer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r502381537



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
##########
@@ -456,25 +456,31 @@ class DataFrameAggregateSuite extends QueryTest
   }
 
   test("zero moments") {

Review comment:
       How about organizing tests like this? (I think it'd better not to update the existing tests as much as possible):
   ```
     test("zero moments") {
       withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR.key -> "true") {
         // Don't touch the existing tests
         val input = Seq((1, 2)).toDF("a", "b")
         checkAnswer(
           input.agg(stddev($"a"), stddev_samp($"a"), stddev_pop($"a"), variance($"a"),
             var_samp($"a"), var_pop($"a"), skewness($"a"), kurtosis($"a")),
           Row(Double.NaN, Double.NaN, 0.0, Double.NaN, Double.NaN, 0.0,
             Double.NaN, Double.NaN))
   
         checkAnswer(
           input.agg(
             expr("stddev(a)"),
             expr("stddev_samp(a)"),
             expr("stddev_pop(a)"),
             expr("variance(a)"),
             expr("var_samp(a)"),
             expr("var_pop(a)"),
             expr("skewness(a)"),
             expr("kurtosis(a)")),
           Row(Double.NaN, Double.NaN, 0.0, Double.NaN, Double.NaN, 0.0,
             Double.NaN, Double.NaN))
       }
     }
   
     test("SPARK-13860: xxxx") {
       // Writes tests for the new behaviour
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706485879


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34213/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707208138






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707520956






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707337598






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706488223






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503329481



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala
##########
@@ -145,7 +150,12 @@ abstract class CentralMomentAgg(child: Expression)
   group = "agg_funcs",
   since = "1.6.0")
 // scalastyle:on line.size.limit
-case class StddevPop(child: Expression) extends CentralMomentAgg(child) {
+case class StddevPop(
+    child: Expression,
+    nullOnDivideByZero: Boolean = SQLConf.get.legacyStatisticalAggregate)

Review comment:
       or rename the parameter to `nanOnDivideByZero`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706942525


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34272/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707032486


   > **[Test build #129674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129674/testReport)** for PR 29983 at commit [`5095df4`](https://github.com/apache/spark/commit/5095df42051f18a5b752a85dd054db419cebb4f0).
   > 
   > * This patch **fails Spark unit tests**.
   > * This patch merges cleanly.
   > * This patch adds no public classes.
   
   need to update golden-file for corr and covar_samp behavior update.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on a change in pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503002923



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2342,6 +2342,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG =
+    buildConf("spark.sql.legacy.centralMomentAgg")
+      .internal()
+      .doc("When set to true, central moment aggregation will return Double.NaN " +
+        "if divide by zero occurred during calculation. " +
+        "Otherwise, it will return null")

Review comment:
       sure. how about adding "before version 3.1.0, it returns NaN by default."




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706023248






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706485876


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on a change in pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503012111



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2342,6 +2342,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG =
+    buildConf("spark.sql.legacy.centralMomentAgg")
+      .internal()
+      .doc("When set to true, central moment aggregation will return Double.NaN " +
+        "if divide by zero occurred during calculation. Otherwise, it will return null. " +
+        "Before version 3.1.0, it returns NaN in divideByZero case by default.")

Review comment:
       Ok, will update.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707245875


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34303/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707620962


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34340/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706832218






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r502371929



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2775,6 +2775,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR =
+    buildConf("spark.sql.legacy.centralMomentAgg.enabled")

Review comment:
       Looks we don't need the suffix `.enabled` for following the other legacy configs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706998196


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129674/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706827407


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34255/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707030116


   **[Test build #129681 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129681/testReport)** for PR 29983 at commit [`e0769ca`](https://github.com/apache/spark/commit/e0769caa516e2c57302c6e39ea8a059b43fcdda3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706515894


   **[Test build #129622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129622/testReport)** for PR 29983 at commit [`370cad1`](https://github.com/apache/spark/commit/370cad16258a938d30734ca9a93c2a613b73325e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707268117






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707041156


   **[Test build #129684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129684/testReport)** for PR 29983 at commit [`811d248`](https://github.com/apache/spark/commit/811d2483a57534901859657d4c924a6501ac9749).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707609916


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34340/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503036793



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala
##########
@@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleto
   }
 
   test("windowing.q -- 15. testExpressions") {
-    // Moved because:
-    // - Spark uses a different default stddev (sample instead of pop)
-    // - Tiny numerical differences in stddev results.
-    // - Different StdDev behavior when n=1 (NaN instead of 0)
-    checkAnswer(sql(s"""
-      |select  p_mfgr,p_name, p_size,
-      |rank() over(distribute by p_mfgr sort by p_name) as r,
-      |dense_rank() over(distribute by p_mfgr sort by p_name) as dr,
-      |cume_dist() over(distribute by p_mfgr sort by p_name) as cud,
-      |percent_rank() over(distribute by p_mfgr sort by p_name) as pr,
-      |ntile(3) over(distribute by p_mfgr sort by p_name) as nt,
-      |count(p_size) over(distribute by p_mfgr sort by p_name) as ca,
-      |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg,
-      |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st,
-      |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv,
-      |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv,
-      |first_value(p_size) over w1  as fvW1
-      |from part
-      |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name
-      |             rows between 2 preceding and 2 following)
+    withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") {

Review comment:
       does this provider more test coverage than `DataFrameWindowFunctionsSuite` for this legacy config?

##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala
##########
@@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleto
   }
 
   test("windowing.q -- 15. testExpressions") {
-    // Moved because:
-    // - Spark uses a different default stddev (sample instead of pop)
-    // - Tiny numerical differences in stddev results.
-    // - Different StdDev behavior when n=1 (NaN instead of 0)
-    checkAnswer(sql(s"""
-      |select  p_mfgr,p_name, p_size,
-      |rank() over(distribute by p_mfgr sort by p_name) as r,
-      |dense_rank() over(distribute by p_mfgr sort by p_name) as dr,
-      |cume_dist() over(distribute by p_mfgr sort by p_name) as cud,
-      |percent_rank() over(distribute by p_mfgr sort by p_name) as pr,
-      |ntile(3) over(distribute by p_mfgr sort by p_name) as nt,
-      |count(p_size) over(distribute by p_mfgr sort by p_name) as ca,
-      |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg,
-      |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st,
-      |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv,
-      |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv,
-      |first_value(p_size) over w1  as fvW1
-      |from part
-      |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name
-      |             rows between 2 preceding and 2 following)
+    withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") {

Review comment:
       does this provide more test coverage than `DataFrameWindowFunctionsSuite` for this legacy config?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706073454


   **[Test build #129579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129579/testReport)** for PR 29983 at commit [`1e7894d`](https://github.com/apache/spark/commit/1e7894dec0ad4f926a8a0db78155402840c2e70d).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707168825


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129689/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707582873


   **[Test build #129734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129734/testReport)** for PR 29983 at commit [`ddc522c`](https://github.com/apache/spark/commit/ddc522cd2a455a08e3151c363645cc3f001465d5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707081291


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34290/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707113876


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34295/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706979015


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34278/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706548479


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129622/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706051723






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503328580



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala
##########
@@ -62,6 +63,10 @@ abstract class CentralMomentAgg(child: Expression)
   protected val m3 = AttributeReference("m3", DoubleType, nullable = false)()
   protected val m4 = AttributeReference("m4", DoubleType, nullable = false)()
 
+  protected lazy val divideByZeroEvalResult: Expression = {
+    if (nullOnDivideByZero) Double.NaN else Literal.create(null, DoubleType)

Review comment:
       `if (nullOnDivideByZero) Literal.create(null, DoubleType) else Double.NaN`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706134628






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707041156


   **[Test build #129684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129684/testReport)** for PR 29983 at commit [`811d248`](https://github.com/apache/spark/commit/811d2483a57534901859657d4c924a6501ac9749).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #29983:
URL: https://github.com/apache/spark/pull/29983


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707452923






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706525101






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706854751


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706477255


   **[Test build #129610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129610/testReport)** for PR 29983 at commit [`a853132`](https://github.com/apache/spark/commit/a853132353dbbb5f446fd9181c955de0f9bcc43e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707734001


   thanks, merging to master!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706854740


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34262/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706641024


   **[Test build #129635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129635/testReport)** for PR 29983 at commit [`a7a6eac`](https://github.com/apache/spark/commit/a7a6eac3e05c57d1e295b460e5b26b4f3287080e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707168815


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707213682


   **[Test build #129697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129697/testReport)** for PR 29983 at commit [`6eee3c9`](https://github.com/apache/spark/commit/6eee3c9042865b7bf3d268188852928d2cb5cd0c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29983: [SPARK-13860][SQL] Change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706148055


   Really, we need to return 0.0 in the case? Looks PostgreSQL/MySQL returns null instead;
   ```
   mysql> create table t (v float8);
   mysql> insert into t values (1.0);
   mysql> SELECT stddev_samp(v) FROM t;
   +----------------+
   | stddev_samp(v) |
   +----------------+
   |           NULL |
   +----------------+
   1 row in set (0.00 sec)
   
   
   postgres=# create table t (v float8);
   postgres=# insert into t values (1.0);
   INSERT 0 1
   postgres=# \pset null 'null'
   Null display is "null".
   postgres=# SELECT stddev_samp(v) FROM t;
    stddev_samp 
   -------------
           null
   (1 row)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706833973


   > Seems plausible to me.
   
   ```
   case class CovSample(left: Expression, right: Expression) extends Covariance(left, right) {
     override val evaluateExpression: Expression = {
       If(n === 0.0, Literal.create(null, DoubleType),
         If(n === 1.0, Double.NaN, ck / (n - 1.0)))
     }
   
   case class Corr(x: Expression, y: Expression)
     extends PearsonCorrelation(x, y) {
   
     override val evaluateExpression: Expression = {
       If(n === 0.0, Literal.create(null, DoubleType),
         If(n === 1.0, Double.NaN, ck / sqrt(xMk * yMk)))
     }
   ```
   
   found two more place return Double.NaN with DivideByZero case. should I update these as well??
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707089687


   **[Test build #129689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129689/testReport)** for PR 29983 at commit [`c4ad6de`](https://github.com/apache/spark/commit/c4ad6de8c27c92ec01565fef4756405821e92f58).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r502364642



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2775,6 +2775,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR =
+    buildConf("spark.sql.legacy.centralMomentAgg.enabled")

Review comment:
       Could you update the migration guide, too?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2775,6 +2775,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR =

Review comment:
       nit: `LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR` -> `LEGACY_CENTRAL_MOMENT_AGG`

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2775,6 +2775,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR =
+    buildConf("spark.sql.legacy.centralMomentAgg.enabled")

Review comment:
       Looks we don't need the suffix `.enabled` for following the other legacy configs.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2775,6 +2775,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR =
+    buildConf("spark.sql.legacy.centralMomentAgg.enabled")
+      .internal()
+      .doc("When set to true, stddev_samp and var_samp will return Double.NaN, " +
+        "if applied to a set with a single element. Otherwise, will return 0.0, " +
+        "which is aligned with TPCDS standard.")

Review comment:
       I think we don't need to describe  `which is aligned with TPCDS standard.` here for user documents.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2775,6 +2775,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR =
+    buildConf("spark.sql.legacy.centralMomentAgg.enabled")

Review comment:
       Also, could you move this config close to the other legacy configs?

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
##########
@@ -456,25 +456,31 @@ class DataFrameAggregateSuite extends QueryTest
   }
 
   test("zero moments") {

Review comment:
       How about organizing tests like this? (I think it'd better not to update the existing tests as much as possible):
   ```
     test("zero moments") {
       withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR.key -> "true") {
         // Don't touch the existing tests
         val input = Seq((1, 2)).toDF("a", "b")
         checkAnswer(
           input.agg(stddev($"a"), stddev_samp($"a"), stddev_pop($"a"), variance($"a"),
             var_samp($"a"), var_pop($"a"), skewness($"a"), kurtosis($"a")),
           Row(Double.NaN, Double.NaN, 0.0, Double.NaN, Double.NaN, 0.0,
             Double.NaN, Double.NaN))
   
         checkAnswer(
           input.agg(
             expr("stddev(a)"),
             expr("stddev_samp(a)"),
             expr("stddev_pop(a)"),
             expr("variance(a)"),
             expr("var_samp(a)"),
             expr("var_pop(a)"),
             expr("skewness(a)"),
             expr("kurtosis(a)")),
           Row(Double.NaN, Double.NaN, 0.0, Double.NaN, Double.NaN, 0.0,
             Double.NaN, Double.NaN))
       }
     }
   
     test("SPARK-13860: xxxx") {
       // Writes tests for the new behaviour
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706644221


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34239/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707727686






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706833295






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503328780



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala
##########
@@ -62,6 +63,10 @@ abstract class CentralMomentAgg(child: Expression)
   protected val m3 = AttributeReference("m3", DoubleType, nullable = false)()
   protected val m4 = AttributeReference("m4", DoubleType, nullable = false)()
 
+  protected lazy val divideByZeroEvalResult: Expression = {

Review comment:
       this can be `def` as it's very cheap.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707448016


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34323/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707432512


   **[Test build #129717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129717/testReport)** for PR 29983 at commit [`084c3fb`](https://github.com/apache/spark/commit/084c3fb876906fcf12faa348fe262e10c19ade7e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707081312






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706997364


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34279/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707060583






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503035902



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala
##########
@@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleto
   }
 
   test("windowing.q -- 15. testExpressions") {
-    // Moved because:
-    // - Spark uses a different default stddev (sample instead of pop)
-    // - Tiny numerical differences in stddev results.
-    // - Different StdDev behavior when n=1 (NaN instead of 0)
-    checkAnswer(sql(s"""
-      |select  p_mfgr,p_name, p_size,
-      |rank() over(distribute by p_mfgr sort by p_name) as r,
-      |dense_rank() over(distribute by p_mfgr sort by p_name) as dr,
-      |cume_dist() over(distribute by p_mfgr sort by p_name) as cud,
-      |percent_rank() over(distribute by p_mfgr sort by p_name) as pr,
-      |ntile(3) over(distribute by p_mfgr sort by p_name) as nt,
-      |count(p_size) over(distribute by p_mfgr sort by p_name) as ca,
-      |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg,
-      |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st,
-      |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv,
-      |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv,
-      |first_value(p_size) over w1  as fvW1
-      |from part
-      |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name
-      |             rows between 2 preceding and 2 following)
+    withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") {
+      // Moved because:
+      // - Spark uses a different default stddev (sample instead of pop)
+      // - Tiny numerical differences in stddev results.
+      // - Different StdDev behavior when n=1 (NaN instead of 0)

Review comment:
       can we just update the expected answer? Then we can remove this comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706100847






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706931586


   **[Test build #129674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129674/testReport)** for PR 29983 at commit [`5095df4`](https://github.com/apache/spark/commit/5095df42051f18a5b752a85dd054db419cebb4f0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706073756


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129579/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706919166


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706661046


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129635/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706100847


   Thank you for your contribution, @leanken .
   BTW, could you check the UT failure? It looks like relevant.
   ```
   org.apache.spark.sql.hive.execution.WindowQuerySuite.windowing.q -- 15. testExpressions
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706832218






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503329174



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala
##########
@@ -145,7 +150,12 @@ abstract class CentralMomentAgg(child: Expression)
   group = "agg_funcs",
   since = "1.6.0")
 // scalastyle:on line.size.limit
-case class StddevPop(child: Expression) extends CentralMomentAgg(child) {
+case class StddevPop(
+    child: Expression,
+    nullOnDivideByZero: Boolean = SQLConf.get.legacyStatisticalAggregate)

Review comment:
       `nullOnDivideByZero: Boolean = !SQLConf.get.legacyStatisticalAggregate`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707176191


   **[Test build #129696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129696/testReport)** for PR 29983 at commit [`dc8efb6`](https://github.com/apache/spark/commit/dc8efb6cb94fee3a7782498b162f8bd471642348).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707176191


   **[Test build #129696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129696/testReport)** for PR 29983 at commit [`dc8efb6`](https://github.com/apache/spark/commit/dc8efb6cb94fee3a7782498b162f8bd471642348).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706920181


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129658/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707519870


   **[Test build #129717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129717/testReport)** for PR 29983 at commit [`084c3fb`](https://github.com/apache/spark/commit/084c3fb876906fcf12faa348fe262e10c19ade7e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706985526


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34279/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707168381


   **[Test build #129689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129689/testReport)** for PR 29983 at commit [`c4ad6de`](https://github.com/apache/spark/commit/c4ad6de8c27c92ec01565fef4756405821e92f58).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707727686






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707452923






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707093796


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706525101






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706832207


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34255/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706854751






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706849241


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34262/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r502376968



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2775,6 +2775,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR =
+    buildConf("spark.sql.legacy.centralMomentAgg.enabled")
+      .internal()
+      .doc("When set to true, stddev_samp and var_samp will return Double.NaN, " +
+        "if applied to a set with a single element. Otherwise, will return 0.0, " +
+        "which is aligned with TPCDS standard.")

Review comment:
       I think we don't need to describe  `which is aligned with TPCDS standard.` here for user documents.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706023248


   **[Test build #129579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129579/testReport)** for PR 29983 at commit [`1e7894d`](https://github.com/apache/spark/commit/1e7894dec0ad4f926a8a0db78155402840c2e70d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707064914






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706929650


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34272/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706488223






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706919011






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706548475


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706485876






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707126540






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706919166






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503618972



##########
File path: sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out
##########
@@ -141,17 +141,17 @@ struct<var_samp(CAST(CAST(udf(ansi_cast(ansi_cast(b as decimal(38,0)) as string)
 -- !query
 SELECT udf(var_pop(1.0)), var_samp(udf(2.0))
 -- !query schema
-struct<CAST(udf(ansi_cast(var_pop(ansi_cast(1.0 as double)) as string)) AS DOUBLE):double,var_samp(CAST(CAST(udf(ansi_cast(2.0 as string)) AS DECIMAL(2,1)) AS DOUBLE)):double>
+struct<CAST(udf(ansi_cast(var_pop(ansi_cast(1.0 as double), true) as string)) AS DOUBLE):double,var_samp(CAST(CAST(udf(ansi_cast(2.0 as string)) AS DECIMAL(2,1)) AS DOUBLE)):double>

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707073492


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34290/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707337598






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503069104



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala
##########
@@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleto
   }
 
   test("windowing.q -- 15. testExpressions") {
-    // Moved because:
-    // - Spark uses a different default stddev (sample instead of pop)
-    // - Tiny numerical differences in stddev results.
-    // - Different StdDev behavior when n=1 (NaN instead of 0)
-    checkAnswer(sql(s"""
-      |select  p_mfgr,p_name, p_size,
-      |rank() over(distribute by p_mfgr sort by p_name) as r,
-      |dense_rank() over(distribute by p_mfgr sort by p_name) as dr,
-      |cume_dist() over(distribute by p_mfgr sort by p_name) as cud,
-      |percent_rank() over(distribute by p_mfgr sort by p_name) as pr,
-      |ntile(3) over(distribute by p_mfgr sort by p_name) as nt,
-      |count(p_size) over(distribute by p_mfgr sort by p_name) as ca,
-      |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg,
-      |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st,
-      |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv,
-      |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv,
-      |first_value(p_size) over w1  as fvW1
-      |from part
-      |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name
-      |             rows between 2 preceding and 2 following)
+    withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") {

Review comment:
       done.

##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala
##########
@@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleto
   }
 
   test("windowing.q -- 15. testExpressions") {
-    // Moved because:
-    // - Spark uses a different default stddev (sample instead of pop)
-    // - Tiny numerical differences in stddev results.
-    // - Different StdDev behavior when n=1 (NaN instead of 0)
-    checkAnswer(sql(s"""
-      |select  p_mfgr,p_name, p_size,
-      |rank() over(distribute by p_mfgr sort by p_name) as r,
-      |dense_rank() over(distribute by p_mfgr sort by p_name) as dr,
-      |cume_dist() over(distribute by p_mfgr sort by p_name) as cud,
-      |percent_rank() over(distribute by p_mfgr sort by p_name) as pr,
-      |ntile(3) over(distribute by p_mfgr sort by p_name) as nt,
-      |count(p_size) over(distribute by p_mfgr sort by p_name) as ca,
-      |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg,
-      |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st,
-      |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv,
-      |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv,
-      |first_value(p_size) over w1  as fvW1
-      |from part
-      |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name
-      |             rows between 2 preceding and 2 following)
+    withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") {
+      // Moved because:
+      // - Spark uses a different default stddev (sample instead of pop)
+      // - Tiny numerical differences in stddev results.
+      // - Different StdDev behavior when n=1 (NaN instead of 0)

Review comment:
       done

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala
##########
@@ -174,7 +175,9 @@ case class StddevSamp(child: Expression) extends CentralMomentAgg(child) {
 
   override val evaluateExpression: Expression = {
     If(n === 0.0, Literal.create(null, DoubleType),
-      If(n === 1.0, Double.NaN, sqrt(m2 / (n - 1.0))))
+      If(n === 1.0,
+        if (SQLConf.get.legacyCentralMomentAgg) Double.NaN else Literal.create(null, DoubleType),

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707064671


   **[Test build #129681 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129681/testReport)** for PR 29983 at commit [`e0769ca`](https://github.com/apache/spark/commit/e0769caa516e2c57302c6e39ea8a059b43fcdda3).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `abstract class CentralMomentAgg(child: Expression, nullOnDivideByZero: Boolean)`
     * `case class StddevPop(`
     * `case class StddevSamp(`
     * `case class VariancePop(`
     * `case class VarianceSamp(`
     * `case class Skewness(`
     * `case class Kurtosis(`
     * `abstract class PearsonCorrelation(x: Expression, y: Expression, nullOnDivideByZero: Boolean)`
     * `case class Corr(`
     * `abstract class Covariance(x: Expression, y: Expression, nullOnDivideByZero: Boolean)`
     * `case class CovPopulation(`
     * `case class CovSample(`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707064928


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129681/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707168815






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503191478



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala
##########
@@ -62,6 +63,9 @@ abstract class CentralMomentAgg(child: Expression)
   protected val m3 = AttributeReference("m3", DoubleType, nullable = false)()
   protected val m4 = AttributeReference("m4", DoubleType, nullable = false)()
 
+  protected val divideByZeroEvalResult: Expression =
+    if (SQLConf.get.legacyStatisticalAggregate) Double.NaN else Literal.create(null, DoubleType)

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706854759


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34262/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706919178






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707093796






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707126540






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706998185


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r502371596



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2775,6 +2775,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR =

Review comment:
       nit: `LEGACY_CENTRAL_MOMENT_AGG_BEHAVIOR` -> `LEGACY_CENTRAL_MOMENT_AGG`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707030116


   **[Test build #129681 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129681/testReport)** for PR 29983 at commit [`e0769ca`](https://github.com/apache/spark/commit/e0769caa516e2c57302c6e39ea8a059b43fcdda3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706525092


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34226/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706051723






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707245897






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706997382






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707064914


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706101418


   cc @maropu 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706661044


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503418293



##########
File path: sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out
##########
@@ -141,17 +141,17 @@ struct<var_samp(CAST(CAST(udf(ansi_cast(ansi_cast(b as decimal(38,0)) as string)
 -- !query
 SELECT udf(var_pop(1.0)), var_samp(udf(2.0))
 -- !query schema
-struct<CAST(udf(ansi_cast(var_pop(ansi_cast(1.0 as double)) as string)) AS DOUBLE):double,var_samp(CAST(CAST(udf(ansi_cast(2.0 as string)) AS DECIMAL(2,1)) AS DOUBLE)):double>
+struct<CAST(udf(ansi_cast(var_pop(ansi_cast(1.0 as double), true) as string)) AS DOUBLE):double,var_samp(CAST(CAST(udf(ansi_cast(2.0 as string)) AS DECIMAL(2,1)) AS DOUBLE)):double>

Review comment:
       This also avoids all the changes to the explain golden files.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706023248


   **[Test build #129579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129579/testReport)** for PR 29983 at commit [`1e7894d`](https://github.com/apache/spark/commit/1e7894dec0ad4f926a8a0db78155402840c2e70d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707726497


   **[Test build #129734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129734/testReport)** for PR 29983 at commit [`ddc522c`](https://github.com/apache/spark/commit/ddc522cd2a455a08e3151c363645cc3f001465d5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706979047






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706931586


   **[Test build #129674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129674/testReport)** for PR 29983 at commit [`5095df4`](https://github.com/apache/spark/commit/5095df42051f18a5b752a85dd054db419cebb4f0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707093670


   **[Test build #129684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129684/testReport)** for PR 29983 at commit [`811d248`](https://github.com/apache/spark/commit/811d2483a57534901859657d4c924a6501ac9749).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706927266


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r502868488



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2342,6 +2342,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG =
+    buildConf("spark.sql.legacy.centralMomentAgg")
+      .internal()
+      .doc("When set to true, central moment aggregation will return Double.NaN " +
+        "if divide by zero occurred during calculation. " +
+        "Otherwise, it will return null")

Review comment:
       Can we describe the Spark version of this legacy behavior? E.g., In what versions it returns NaN.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707620997






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706942553






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706548210


   **[Test build #129622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129622/testReport)** for PR 29983 at commit [`370cad1`](https://github.com/apache/spark/commit/370cad16258a938d30734ca9a93c2a613b73325e).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706043344


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34184/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706073752


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707432512


   **[Test build #129717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129717/testReport)** for PR 29983 at commit [`084c3fb`](https://github.com/apache/spark/commit/084c3fb876906fcf12faa348fe262e10c19ade7e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706852592


   > found two more place return Double.NaN with DivideByZero case. should I update these as well??
   and change ConfigName to spark.sql.legacy.divideByZeroEvaluation ??
   
   Or, `spark.sql.legacy.centralMomentAndCovarianceAgg`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707049625


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34287/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707582873


   **[Test build #129734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129734/testReport)** for PR 29983 at commit [`ddc522c`](https://github.com/apache/spark/commit/ddc522cd2a455a08e3151c363645cc3f001465d5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken edited a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken edited a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706869405


   > > Seems plausible to me.
   > 
   > ```
   > case class CovSample(left: Expression, right: Expression) extends Covariance(left, right) {
   >   override val evaluateExpression: Expression = {
   >     If(n === 0.0, Literal.create(null, DoubleType),
   >       If(n === 1.0, Double.NaN, ck / (n - 1.0)))
   >   }
   > 
   > case class Corr(x: Expression, y: Expression)
   >   extends PearsonCorrelation(x, y) {
   > 
   >   override val evaluateExpression: Expression = {
   >     If(n === 0.0, Literal.create(null, DoubleType),
   >       If(n === 1.0, Double.NaN, ck / sqrt(xMk * yMk)))
   >   }
   > ```
   > 
   > found two more place return Double.NaN with DivideByZero case. should I update these as well??
   > and change ConfigName to `spark.sql.legacy.divideByZeroEvaluation` ??
   
   @cloud-fan how about CovSample and Corr ?? should I update them as well in this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706645449






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706869405


   > > Seems plausible to me.
   > 
   > ```
   > case class CovSample(left: Expression, right: Expression) extends Covariance(left, right) {
   >   override val evaluateExpression: Expression = {
   >     If(n === 0.0, Literal.create(null, DoubleType),
   >       If(n === 1.0, Double.NaN, ck / (n - 1.0)))
   >   }
   > 
   > case class Corr(x: Expression, y: Expression)
   >   extends PearsonCorrelation(x, y) {
   > 
   >   override val evaluateExpression: Expression = {
   >     If(n === 0.0, Literal.create(null, DoubleType),
   >       If(n === 1.0, Double.NaN, ck / sqrt(xMk * yMk)))
   >   }
   > ```
   > 
   > found two more place return Double.NaN with DivideByZero case. should I update these as well??
   > and change ConfigName to `spark.sql.legacy.divideByZeroEvaluation` ??
   
   @cloud-fan how about CovSample and Corr ?? should I update them as well in these patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken edited a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
leanken edited a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706905860


   > > found two more place return Double.NaN with DivideByZero case. should I update these as well??
   > > and change ConfigName to spark.sql.legacy.divideByZeroEvaluation ??
   > 
   > Or, `spark.sql.legacy.centralMomentAndCovarianceAgg`?
   
   change into spark.sql.legacy.statisticalAggregate
   all these methods are statistical aggregation function


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706522929


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34226/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29983: [SPARK-13860][SQL] change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706134628


   Thanks for cc, @dongjoon-hyun !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503035473



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala
##########
@@ -174,7 +175,9 @@ case class StddevSamp(child: Expression) extends CentralMomentAgg(child) {
 
   override val evaluateExpression: Expression = {
     If(n === 0.0, Literal.create(null, DoubleType),
-      If(n === 1.0, Double.NaN, sqrt(m2 / (n - 1.0))))
+      If(n === 1.0,
+        if (SQLConf.get.legacyCentralMomentAgg) Double.NaN else Literal.create(null, DoubleType),

Review comment:
       can we put it as a method in the parent class `CentralMomentAgg`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706515894


   **[Test build #129622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129622/testReport)** for PR 29983 at commit [`370cad1`](https://github.com/apache/spark/commit/370cad16258a938d30734ca9a93c2a613b73325e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change stddev_samp and var_samp to return 0.0 instead of Double.NaN to align with TPCDS standard.

Posted by GitBox <gi...@apache.org>.
leanken commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706465011


   > Really, we need to return 0.0 in the case? Looks PostgreSQL/MySQL returns null instead;
   > 
   > ```
   > mysql> create table t (v float8);
   > mysql> insert into t values (1.0);
   > mysql> SELECT stddev_samp(v) FROM t;
   > +----------------+
   > | stddev_samp(v) |
   > +----------------+
   > |           NULL |
   > +----------------+
   > 1 row in set (0.00 sec)
   > 
   > 
   > postgres=# create table t (v float8);
   > postgres=# insert into t values (1.0);
   > INSERT 0 1
   > postgres=# \pset null 'null'
   > Null display is "null".
   > postgres=# SELECT stddev_samp(v) FROM t;
   >  stddev_samp 
   > -------------
   >         null
   > (1 row)
   > ```
   
   let me find more doc and see if returning null meet the TPCDS answer Q39. reply you later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503011445



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2342,6 +2342,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_CENTRAL_MOMENT_AGG =
+    buildConf("spark.sql.legacy.centralMomentAgg")
+      .internal()
+      .doc("When set to true, central moment aggregation will return Double.NaN " +
+        "if divide by zero occurred during calculation. Otherwise, it will return null. " +
+        "Before version 3.1.0, it returns NaN in divideByZero case by default.")

Review comment:
       Shall we update migration guide (https://github.com/apache/spark/blob/master/docs/sql-migration-guide.md) too? Looks like it's more a behaviour change instead of a big fix.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707060583






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707089687


   **[Test build #129689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129689/testReport)** for PR 29983 at commit [`c4ad6de`](https://github.com/apache/spark/commit/c4ad6de8c27c92ec01565fef4756405821e92f58).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503328058



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
##########
@@ -450,14 +450,20 @@ object TypeCoercion {
       case Abs(e @ StringType()) => Abs(Cast(e, DoubleType))
       case Sum(e @ StringType()) => Sum(Cast(e, DoubleType))
       case Average(e @ StringType()) => Average(Cast(e, DoubleType))
-      case StddevPop(e @ StringType()) => StddevPop(Cast(e, DoubleType))
-      case StddevSamp(e @ StringType()) => StddevSamp(Cast(e, DoubleType))
+      case StddevPop(e @ StringType(), nullOnDivideByZero) =>
+        StddevPop(Cast(e, DoubleType), nullOnDivideByZero)

Review comment:
       can we use
   ```
   case s @ StddevPop(e @ StringType(), _) =>
     s.withNewChildren(Seq(Cast(e, DoubleType)))
   ```
   
   It can retain the `TreeNodeTag` if there are any.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503417842



##########
File path: sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out
##########
@@ -141,17 +141,17 @@ struct<var_samp(CAST(CAST(udf(ansi_cast(ansi_cast(b as decimal(38,0)) as string)
 -- !query
 SELECT udf(var_pop(1.0)), var_samp(udf(2.0))
 -- !query schema
-struct<CAST(udf(ansi_cast(var_pop(ansi_cast(1.0 as double)) as string)) AS DOUBLE):double,var_samp(CAST(CAST(udf(ansi_cast(2.0 as string)) AS DECIMAL(2,1)) AS DOUBLE)):double>
+struct<CAST(udf(ansi_cast(var_pop(ansi_cast(1.0 as double), true) as string)) AS DOUBLE):double,var_samp(CAST(CAST(udf(ansi_cast(2.0 as string)) AS DECIMAL(2,1)) AS DOUBLE)):double>

Review comment:
       legacy config is internal and all the functions in one query should be all legacy or not legacy. I think we don't need to display the legacy flag value. We can override  `stringArgs` in these functions (the base classes) to exclude the legacy flag.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706477255


   **[Test build #129610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129610/testReport)** for PR 29983 at commit [`a853132`](https://github.com/apache/spark/commit/a853132353dbbb5f446fd9181c955de0f9bcc43e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706661044






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707126515


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34295/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706920156






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707268129


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129696/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706998185






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-707093808


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129684/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706942553






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org