You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2017/10/14 05:12:28 UTC

[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

GitHub user huaxingao opened a pull request:

    https://github.com/apache/spark/pull/19496

    [SPARK-22271][SQL]mean overflows and returns null for some decimal variables

    
    
    ## What changes were proposed in this pull request?
    
    In Average.scala, it has
    ```
      override lazy val evaluateExpression = child.dataType match {
        case DecimalType.Fixed(p, s) =>
          // increase the precision and scale to prevent precision loss
          val dt = DecimalType.bounded(p + 14, s + 4)
          Cast(Cast(sum, dt) / Cast(count, dt), resultType)
        case _ =>
          Cast(sum, resultType) / Cast(count, resultType)
      }
    
      def setChild (newchild: Expression) = {
        child = newchild
      }
    
    ```
    It is possible that  Cast(count, dt), resultType) will make the precision of the decimal number bigger than 38, and this causes over flow.  Since count is an integer and doesn't need a scale, I will cast it using DecimalType.bounded(38,0)
    ## How was this patch tested?
    In DataFrameSuite, I will add a test case. 
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/huaxingao/spark spark-22271

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19496.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19496
    
----
commit a3437ee4a87d1f51b362adeb20d4fcc264085ba7
Author: Huaxin Gao <hu...@us.ibm.com>
Date:   2017-10-14T04:45:27Z

    [SPARK-22271][SQL]mean overflows and returns null for some decimal variables

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    **[Test build #82770 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82770/testReport)** for PR 19496 at commit [`72467e1`](https://github.com/apache/spark/commit/72467e175d626648451940bdbccaf7866b2ded6c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19496#discussion_r145034645
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
           testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
           Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2)))
       }
    +
    +  test("SPARK-22271: mean overflows and returns null for some decimal variables") {
    +    val d = 0.034567890
    +    val df = Seq(d, d, d, d, d, d, d, d, d, d).toDF("DecimalCol")
    +    val result = df.select('DecimalCol cast DecimalType(38, 33))
    +        .select(col("DecimalCol")).describe()
    --- End diff --
    
    Nit: two space indents.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    @gatorsmile Thank you very much for your help!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19496#discussion_r144723280
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
           testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
           Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2)))
       }
    +
    +  test("SPARK-22271: mean overflows and returns null for some decimal variables") {
    +    val d: BigDecimal = BigDecimal(0.034567890)
    --- End diff --
    
    Do we need `: BigDecimal`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19496


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    **[Test build #82755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82755/testReport)** for PR 19496 at commit [`a3437ee`](https://github.com/apache/spark/commit/a3437ee4a87d1f51b362adeb20d4fcc264085ba7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82755/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19496#discussion_r145182120
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
           testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
           Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2)))
       }
    +
    +  test("SPARK-22271: mean overflows and returns null for some decimal variables") {
    +    val d = 0.034567890
    +    val df = Seq(d, d, d, d, d, d, d, d, d, d).toDF("DecimalCol")
    +    val result = df.select('DecimalCol cast DecimalType(38, 33))
    +        .select(col("DecimalCol")).describe()
    +    val mean = result.select("DecimalCol").where($"summary" === "mean")
    +    assert(mean.collect.toSet === Set(Row("0.0345678900000000000000000000000000000")))
    --- End diff --
    
    @gatorsmile Thanks Sean for your review. I will fix the problems. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19496#discussion_r144732734
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
           testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
           Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2)))
       }
    +
    +  test("SPARK-22271: mean overflows and returns null for some decimal variables") {
    +    val d: BigDecimal = BigDecimal(0.034567890)
    --- End diff --
    
    It's not necessary. I will remove. Thanks for your review. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82846/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19496#discussion_r144689475
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala ---
    @@ -80,7 +80,8 @@ case class Average(child: Expression) extends DeclarativeAggregate with Implicit
         case DecimalType.Fixed(p, s) =>
           // increase the precision and scale to prevent precision loss
           val dt = DecimalType.bounded(p + 14, s + 4)
    -      Cast(Cast(sum, dt) / Cast(count, dt), resultType)
    +      Cast(Cast(sum, dt) / Cast(count, DecimalType.bounded (DecimalType.MAX_PRECISION, 0)),
    --- End diff --
    
    No need to add space after `bounded`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19496#discussion_r145034923
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
           testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
           Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2)))
       }
    +
    +  test("SPARK-22271: mean overflows and returns null for some decimal variables") {
    +    val d = 0.034567890
    +    val df = Seq(d, d, d, d, d, d, d, d, d, d).toDF("DecimalCol")
    +    val result = df.select('DecimalCol cast DecimalType(38, 33))
    +        .select(col("DecimalCol")).describe()
    +    val mean = result.select("DecimalCol").where($"summary" === "mean")
    +    assert(mean.collect.toSet === Set(Row("0.0345678900000000000000000000000000000")))
    --- End diff --
    
    Nit: `collect` -> `collect()`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    **[Test build #82846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82846/testReport)** for PR 19496 at commit [`8438a61`](https://github.com/apache/spark/commit/8438a6171e015d4ad239c1562254635ee5ed51ce).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    **[Test build #82846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82846/testReport)** for PR 19496 at commit [`8438a61`](https://github.com/apache/spark/commit/8438a6171e015d4ad239c1562254635ee5ed51ce).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    **[Test build #82755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82755/testReport)** for PR 19496 at commit [`a3437ee`](https://github.com/apache/spark/commit/a3437ee4a87d1f51b362adeb20d4fcc264085ba7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    OK to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    LGTM
    
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    **[Test build #82762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82762/testReport)** for PR 19496 at commit [`de2aa69`](https://github.com/apache/spark/commit/de2aa6975c31f4c095e07a34b66b24ee39f83b01).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    **[Test build #82770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82770/testReport)** for PR 19496 at commit [`72467e1`](https://github.com/apache/spark/commit/72467e175d626648451940bdbccaf7866b2ded6c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82770/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82762/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    **[Test build #82762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82762/testReport)** for PR 19496 at commit [`de2aa69`](https://github.com/apache/spark/commit/de2aa6975c31f4c095e07a34b66b24ee39f83b01).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19496: [SPARK-22271][SQL]mean overflows and returns null for so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19496
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org