You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/08/01 19:27:06 UTC

[GitHub] spark pull request #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with dec...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/21949

    [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal followed by aggregation returns wrong result

    ## What changes were proposed in this pull request?
    
    When we do an average, the result is computed dividing the sum of the values by their count. In the case the result is a DecimalType, the way we are casting/managing the precision and scale is not really optimized and it is not coherent with what we do normally.
    
    In particular, a problem can happen when the Divide operand returns a result which contains a precision and scale different by the ones which are expected as output of the Divide operand. In the case reported in the JIRA, for instance, the result of the Divide operand is a Decimal(38, 36), while the output data type for Divide is 38, 22. This is not an issue when the Divide is followed by a CheckOverflow or a Cast to the right data type, as these operations return a decimal with the defined precision and scale. Despite in the Average operator we do have a Cast, this may be bypassed if the result of Divide is the same type which it is casted to, hence the issue reported in the JIRA may arise.
    
    The PR proposes to use the normal rules/handling of the arithmetic operators with Decimal data type, so we both reuse the existing code (having a single logic for operations between decimals) and we fix this problem as the result is always guarded by CheckOverflow.
    
    ## How was this patch tested?
    
    added UT


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-24957_2.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21949
    
----
commit 1f817a058887090ebf41c510a3b6086a062433d6
Author: Marco Gaido <ma...@...>
Date:   2018-07-29T10:53:09Z

    [SPARK-24957][SQL] Average with decimal followed by aggregation returns wrong result

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21949
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21949
  
    Thanks, closed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with dec...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 closed the pull request at:

    https://github.com/apache/spark/pull/21949


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21949
  
    Thanks! Merged to 2.2. 
    
    Could you close this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21949
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93900/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21949
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21949
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1580/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21949
  
    **[Test build #93900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93900/testReport)** for PR 21949 at commit [`1f817a0`](https://github.com/apache/spark/commit/1f817a058887090ebf41c510a3b6086a062433d6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21949
  
    **[Test build #93900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93900/testReport)** for PR 21949 at commit [`1f817a0`](https://github.com/apache/spark/commit/1f817a058887090ebf41c510a3b6086a062433d6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

Posted by holdensmagicalunicorn <gi...@git.apache.org>.
Github user holdensmagicalunicorn commented on the issue:

    https://github.com/apache/spark/pull/21949
  
    @mgaido91, thanks! I am a bot who has found some folks who might be able to help with the review:@rxin, @marmbrus and @gatorsmile


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21949
  
    cc @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org