You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2017/10/14 05:12:28 UTC
[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...
GitHub user huaxingao opened a pull request:
https://github.com/apache/spark/pull/19496
[SPARK-22271][SQL]mean overflows and returns null for some decimal variables
## What changes were proposed in this pull request?
In Average.scala, it has
```
override lazy val evaluateExpression = child.dataType match {
case DecimalType.Fixed(p, s) =>
// increase the precision and scale to prevent precision loss
val dt = DecimalType.bounded(p + 14, s + 4)
Cast(Cast(sum, dt) / Cast(count, dt), resultType)
case _ =>
Cast(sum, resultType) / Cast(count, resultType)
}
def setChild (newchild: Expression) = {
child = newchild
}
```
It is possible that Cast(count, dt), resultType) will make the precision of the decimal number bigger than 38, and this causes over flow. Since count is an integer and doesn't need a scale, I will cast it using DecimalType.bounded(38,0)
## How was this patch tested?
In DataFrameSuite, I will add a test case.
Please review http://spark.apache.org/contributing.html before opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/huaxingao/spark spark-22271
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19496.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19496
----
commit a3437ee4a87d1f51b362adeb20d4fcc264085ba7
Author: Huaxin Gao <hu...@us.ibm.com>
Date: 2017-10-14T04:45:27Z
[SPARK-22271][SQL]mean overflows and returns null for some decimal variables
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19496
**[Test build #82770 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82770/testReport)** for PR 19496 at commit [`72467e1`](https://github.com/apache/spark/commit/72467e175d626648451940bdbccaf7866b2ded6c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19496#discussion_r145034645
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
@@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2)))
}
+
+ test("SPARK-22271: mean overflows and returns null for some decimal variables") {
+ val d = 0.034567890
+ val df = Seq(d, d, d, d, d, d, d, d, d, d).toDF("DecimalCol")
+ val result = df.select('DecimalCol cast DecimalType(38, 33))
+ .select(col("DecimalCol")).describe()
--- End diff --
Nit: two space indents.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:
https://github.com/apache/spark/pull/19496
@gatorsmile Thank you very much for your help!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19496#discussion_r144723280
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
@@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2)))
}
+
+ test("SPARK-22271: mean overflows and returns null for some decimal variables") {
+ val d: BigDecimal = BigDecimal(0.034567890)
--- End diff --
Do we need `: BigDecimal`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19496
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19496
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19496
**[Test build #82755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82755/testReport)** for PR 19496 at commit [`a3437ee`](https://github.com/apache/spark/commit/a3437ee4a87d1f51b362adeb20d4fcc264085ba7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19496
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82755/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19496#discussion_r145182120
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
@@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2)))
}
+
+ test("SPARK-22271: mean overflows and returns null for some decimal variables") {
+ val d = 0.034567890
+ val df = Seq(d, d, d, d, d, d, d, d, d, d).toDF("DecimalCol")
+ val result = df.select('DecimalCol cast DecimalType(38, 33))
+ .select(col("DecimalCol")).describe()
+ val mean = result.select("DecimalCol").where($"summary" === "mean")
+ assert(mean.collect.toSet === Set(Row("0.0345678900000000000000000000000000000")))
--- End diff --
@gatorsmile Thanks Sean for your review. I will fix the problems.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19496#discussion_r144732734
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
@@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2)))
}
+
+ test("SPARK-22271: mean overflows and returns null for some decimal variables") {
+ val d: BigDecimal = BigDecimal(0.034567890)
--- End diff --
It's not necessary. I will remove. Thanks for your review.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19496
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82846/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19496
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19496#discussion_r144689475
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala ---
@@ -80,7 +80,8 @@ case class Average(child: Expression) extends DeclarativeAggregate with Implicit
case DecimalType.Fixed(p, s) =>
// increase the precision and scale to prevent precision loss
val dt = DecimalType.bounded(p + 14, s + 4)
- Cast(Cast(sum, dt) / Cast(count, dt), resultType)
+ Cast(Cast(sum, dt) / Cast(count, DecimalType.bounded (DecimalType.MAX_PRECISION, 0)),
--- End diff --
No need to add space after `bounded`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19496#discussion_r145034923
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
@@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2)))
}
+
+ test("SPARK-22271: mean overflows and returns null for some decimal variables") {
+ val d = 0.034567890
+ val df = Seq(d, d, d, d, d, d, d, d, d, d).toDF("DecimalCol")
+ val result = df.select('DecimalCol cast DecimalType(38, 33))
+ .select(col("DecimalCol")).describe()
+ val mean = result.select("DecimalCol").where($"summary" === "mean")
+ assert(mean.collect.toSet === Set(Row("0.0345678900000000000000000000000000000")))
--- End diff --
Nit: `collect` -> `collect()`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19496
**[Test build #82846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82846/testReport)** for PR 19496 at commit [`8438a61`](https://github.com/apache/spark/commit/8438a6171e015d4ad239c1562254635ee5ed51ce).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19496
**[Test build #82846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82846/testReport)** for PR 19496 at commit [`8438a61`](https://github.com/apache/spark/commit/8438a6171e015d4ad239c1562254635ee5ed51ce).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19496
**[Test build #82755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82755/testReport)** for PR 19496 at commit [`a3437ee`](https://github.com/apache/spark/commit/a3437ee4a87d1f51b362adeb20d4fcc264085ba7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19496
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19496
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19496
OK to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19496
LGTM
Thanks! Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19496
**[Test build #82762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82762/testReport)** for PR 19496 at commit [`de2aa69`](https://github.com/apache/spark/commit/de2aa6975c31f4c095e07a34b66b24ee39f83b01).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19496
**[Test build #82770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82770/testReport)** for PR 19496 at commit [`72467e1`](https://github.com/apache/spark/commit/72467e175d626648451940bdbccaf7866b2ded6c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19496
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82770/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19496
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82762/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19496
**[Test build #82762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82762/testReport)** for PR 19496 at commit [`de2aa69`](https://github.com/apache/spark/commit/de2aa6975c31f4c095e07a34b66b24ee39f83b01).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19496
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19496
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org