You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by ajithme <gi...@git.apache.org> on 2018/09/12 08:32:13 UTC

[GitHub] spark pull request #22401: [SPARK-25413] Precision value is going for toss w...

GitHub user ajithme opened a pull request:

    https://github.com/apache/spark/pull/22401

    [SPARK-25413] Precision value is going for toss when Avg is done

    As per the definition, see org.apache.spark.sql.catalyst.analysis.DecimalPrecision
    
    Operation :  e1 + e2 
    Result Precision :  max(s1, s2) + max(p1-s1, p2-s2) + 1                   
    Result Scale : max(s1, s2) 
    
    hence sumDataType must also follow this


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajithme/spark precision

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22401.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22401
    
----
commit 68deebbb5a2d20a37cf2ca03d1a82ba2b8331cfc
Author: Ajith <aj...@...>
Date:   2018-09-12T08:26:05Z

    Correct the precision for sumDataType

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by ajithme <gi...@git.apache.org>.

Github user ajithme commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    its difficult for end user to set this depending on his query ( queries which are similar to SPARK-25413 AND SPARK-24957 ) but yes, i agree with the point that rounding off cale is better than getting wromg results. So, is there a documentation/pointer on why we have current sumdatatype in average taken as p+10,s, just for curiosity.?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22401: [SPARK-25413] Precision value is going for toss w...

Posted by ajithme <gi...@git.apache.org>.

Github user ajithme commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22401#discussion_r216954947
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala ---
    @@ -36,7 +36,13 @@ abstract class AverageLike(child: Expression) extends DeclarativeAggregate {
       }
     
       private lazy val sumDataType = child.dataType match {
    -    case _ @ DecimalType.Fixed(p, s) => DecimalType.bounded(p + 10, s)
    +    /*
    +     * In case of sum of decimal ( assuming another decimal of same precision and scale)
    +     * Refer : org.apache.spark.sql.catalyst.analysis.DecimalPrecision
    +     * Precision : max(s1, s2) + max(p1 - s1, p2 - s2) + 1
    +     * Scale : max(s1, s2)
    +     */
    +    case _ @ DecimalType.Fixed(p, s) => DecimalType.adjustPrecisionScale(s + (p - s) + 1, s)
    --- End diff --
    
    Just checked for sqlserver https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-2017


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    > is there a documentation/pointer on why we have current sumdatatype in average taken as p+10,s, just for curiosity.?
    
    Not that I know of, sorry. It was introduced a long time ago (#7605) and never changed.
    
    Anyway, I think it is a valid question how we manage decimals in aggregates. I think we should revisit all these aggregation operation to match with SQLServer behavior for decimals. Probably we can target this for 3.0. WDYT @cloud-fan ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    What SQLServer does is explained here: https://docs.microsoft.com/en-us/sql/t-sql/functions/avg-transact-sql?view=sql-server-2017. The point is that we would need to change the result type, which I am not sure we can do before 3.0.
    
    cc @cloud-fan @dongjoon-hyun  @gatorsmile for advice on this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by ajithme <gi...@git.apache.org>.

Github user ajithme commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    i agree with the resolution on + vs sum but i also see that that avg precision and scale cannot be calculated well ahead in this case which satisfies all scenarios. but i am just suggesting this as a temp solution until we decude in change result type, so can you guys suggest me how to handle this.? as our use case is broken beyond 2.3.1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22401: [SPARK-25413] Precision value is going for toss w...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22401#discussion_r217049563
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala ---
    @@ -36,7 +36,13 @@ abstract class AverageLike(child: Expression) extends DeclarativeAggregate {
       }
     
       private lazy val sumDataType = child.dataType match {
    -    case _ @ DecimalType.Fixed(p, s) => DecimalType.bounded(p + 10, s)
    +    /*
    +     * In case of sum of decimal ( assuming another decimal of same precision and scale)
    +     * Refer : org.apache.spark.sql.catalyst.analysis.DecimalPrecision
    +     * Precision : max(s1, s2) + max(p1 - s1, p2 - s2) + 1
    +     * Scale : max(s1, s2)
    +     */
    +    case _ @ DecimalType.Fixed(p, s) => DecimalType.adjustPrecisionScale(s + (p - s) + 1, s)
    --- End diff --
    
    You're right, because we are not checking the overflow in the Add operation, so despite we are in an error condition we don't detect it, but it doesn't sound great to me to rely on a currently missing check. I am not sure, though, if in special cases this can anyway cause an issue also with the current missing check.
    
    Moreover, as you can see from the link I have posted, SQLServer - which is the reference for the way we handle decimals here - uses: decimal(38, s) divided by decimal(10, 0). So SQLServer. So I think this is what we should do eventually, but it implies changing the result type.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22401: [SPARK-25413] Precision value is going for toss w...

Posted by ajithme <gi...@git.apache.org>.

Github user ajithme commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22401#discussion_r217038517
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala ---
    @@ -36,7 +36,13 @@ abstract class AverageLike(child: Expression) extends DeclarativeAggregate {
       }
     
       private lazy val sumDataType = child.dataType match {
    -    case _ @ DecimalType.Fixed(p, s) => DecimalType.bounded(p + 10, s)
    +    /*
    +     * In case of sum of decimal ( assuming another decimal of same precision and scale)
    +     * Refer : org.apache.spark.sql.catalyst.analysis.DecimalPrecision
    +     * Precision : max(s1, s2) + max(p1 - s1, p2 - s2) + 1
    +     * Scale : max(s1, s2)
    +     */
    +    case _ @ DecimalType.Fixed(p, s) => DecimalType.adjustPrecisionScale(s + (p - s) + 1, s)
    --- End diff --
    
    well i tested as per your suggestion with my PR : 
    ```
     sql("create table if not exists table1(salary decimal(2,1))")
     (1 to 22).foreach(_ => sql("insert into table1 values(9.1)"))
     sql("select avg(salary) from table1").show(false)
    
    +-----------+
    |avg(salary)|
    +-----------+
    |9.10000    |
    +-----------+
    ```
    which is expected result and i don't see a overflow as divide will readjust precision. Can you test with my patch for a overflow specifically in case of average.?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22401: [SPARK-25413] Precision value is going for toss w...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22401#discussion_r217020447
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala ---
    @@ -36,7 +36,13 @@ abstract class AverageLike(child: Expression) extends DeclarativeAggregate {
       }
     
       private lazy val sumDataType = child.dataType match {
    -    case _ @ DecimalType.Fixed(p, s) => DecimalType.bounded(p + 10, s)
    +    /*
    +     * In case of sum of decimal ( assuming another decimal of same precision and scale)
    +     * Refer : org.apache.spark.sql.catalyst.analysis.DecimalPrecision
    +     * Precision : max(s1, s2) + max(p1 - s1, p2 - s2) + 1
    +     * Scale : max(s1, s2)
    +     */
    +    case _ @ DecimalType.Fixed(p, s) => DecimalType.adjustPrecisionScale(s + (p - s) + 1, s)
    --- End diff --
    
    well, in your example, with input data of decimal(2,1), this "buffer" with your change would be a decimal (3, 1).. If your input data contains 21 9.1 items, this would overflow (191.1 doesn't fit a decimal(3,1)).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22401: [SPARK-25413] Precision value is going for toss w...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22401#discussion_r216981131
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala ---
    @@ -36,7 +36,13 @@ abstract class AverageLike(child: Expression) extends DeclarativeAggregate {
       }
     
       private lazy val sumDataType = child.dataType match {
    -    case _ @ DecimalType.Fixed(p, s) => DecimalType.bounded(p + 10, s)
    +    /*
    +     * In case of sum of decimal ( assuming another decimal of same precision and scale)
    +     * Refer : org.apache.spark.sql.catalyst.analysis.DecimalPrecision
    +     * Precision : max(s1, s2) + max(p1 - s1, p2 - s2) + 1
    +     * Scale : max(s1, s2)
    +     */
    +    case _ @ DecimalType.Fixed(p, s) => DecimalType.adjustPrecisionScale(s + (p - s) + 1, s)
    --- End diff --
    
    No, we can't because we would risk (well, we would likely hit) an overflow. Indeed, I am not sure if you run all the UTs with your change, but I'd expect many failures due to overflow after this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by ajithme <gi...@git.apache.org>.

Github user ajithme commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    so to summarize the discussion,
    1. current code which does p+10,s is still a mystery, but it breaks usecase mentioned by SPARK-25413
    2. changes by this PR are working because of missing add operation's check, so relying on it is not ok. But we do not have a example where it can break with existing Code + this PR
    3. Its better to follow SQLserver behaviour, yes agree. 3.0 may be too late.?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    According to the PR introduced the `+ 10` behavior, it said this follows Hive.
    
    Whatever we want to propose, let's clearly write down the tradeoffs. e.g. maybe too keep larger precision, we are more likely to hit overflow, etc.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22401: [SPARK-25413] Precision value is going for toss w...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22401#discussion_r216953011
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala ---
    @@ -36,7 +36,13 @@ abstract class AverageLike(child: Expression) extends DeclarativeAggregate {
       }
     
       private lazy val sumDataType = child.dataType match {
    -    case _ @ DecimalType.Fixed(p, s) => DecimalType.bounded(p + 10, s)
    +    /*
    +     * In case of sum of decimal ( assuming another decimal of same precision and scale)
    +     * Refer : org.apache.spark.sql.catalyst.analysis.DecimalPrecision
    +     * Precision : max(s1, s2) + max(p1 - s1, p2 - s2) + 1
    +     * Scale : max(s1, s2)
    +     */
    +    case _ @ DecimalType.Fixed(p, s) => DecimalType.adjustPrecisionScale(s + (p - s) + 1, s)
    --- End diff --
    
    The point is that here the sum operation is executed many times, not only once. So I am not sure that this the right way to deal with it. It would be great to check what other RDBMs do in this case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    @ajithme I don't think this is a good solution. We had to change because of bugs which were present in the previous implementation which could lead to wrong results. Here there is no wrong result, the only difference is a lower precision (which we anyway guarantee to be > 6 digits after the comma in any case/any operation). Do you really need a higher precision?
    
    One thing you may try is to set `spark.sql.decimalOperations.allowPrecisionLoss` to `false`, but I am not sure it will help.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    Just to be clear:
    
    > current code which does p+10 breaks usecase mentioned by SPARK-25413
    
    I think setting `spark.sql.decimalOperations.allowPrecisionLoss` to `false` can solve the issue and the reason why that flag is there is to be used in situation when a precision loss is not acceptable (ie. you prefer to risk having a NULL in the result rather than loosing any precision). If this is your case, please use it.
    
    Moreover, I don't think the major issue is the `+10` compared to SQLServer behavior (SQLServer is using decimal(38, s)). Our major difference with SQLServer behavior is the cast to the result type which SQLServer is missing.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22401: [SPARK-25413] Precision value is going for toss w...

Posted by ajithme <gi...@git.apache.org>.

Github user ajithme commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22401#discussion_r216979582
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala ---
    @@ -36,7 +36,13 @@ abstract class AverageLike(child: Expression) extends DeclarativeAggregate {
       }
     
       private lazy val sumDataType = child.dataType match {
    -    case _ @ DecimalType.Fixed(p, s) => DecimalType.bounded(p + 10, s)
    +    /*
    +     * In case of sum of decimal ( assuming another decimal of same precision and scale)
    +     * Refer : org.apache.spark.sql.catalyst.analysis.DecimalPrecision
    +     * Precision : max(s1, s2) + max(p1 - s1, p2 - s2) + 1
    +     * Scale : max(s1, s2)
    +     */
    +    case _ @ DecimalType.Fixed(p, s) => DecimalType.adjustPrecisionScale(s + (p - s) + 1, s)
    --- End diff --
    
    Yes i agree. But the point is arbitrarily having precision increased by 10 can cause loss of scale more often and calculating the times is costly. As avg = (sum(data)/times), can we have precision and scale of sum(data) restricted as described by + operation.?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22401: [SPARK-25413] Precision value is going for toss w...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22401


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    @ajithme it would be great if you could update the description following the Spark's PR template. Moreover, please add relevant UTs for this change. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    It's not only about `avg`, it's also about `sum`.
    
    I don't think the decision is made randomly, IIRC we did check other databases and pick the best one we can do.
    
    `sum` is definitely different from `+`, we can't just follow `+` here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22401: [SPARK-25413] Precision value is going for toss w...

Posted by ajithme <gi...@git.apache.org>.

Github user ajithme commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22401#discussion_r216984413
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala ---
    @@ -36,7 +36,13 @@ abstract class AverageLike(child: Expression) extends DeclarativeAggregate {
       }
     
       private lazy val sumDataType = child.dataType match {
    -    case _ @ DecimalType.Fixed(p, s) => DecimalType.bounded(p + 10, s)
    +    /*
    +     * In case of sum of decimal ( assuming another decimal of same precision and scale)
    +     * Refer : org.apache.spark.sql.catalyst.analysis.DecimalPrecision
    +     * Precision : max(s1, s2) + max(p1 - s1, p2 - s2) + 1
    +     * Scale : max(s1, s2)
    +     */
    +    case _ @ DecimalType.Fixed(p, s) => DecimalType.adjustPrecisionScale(s + (p - s) + 1, s)
    --- End diff --
    
    But division operation will readjust the precision again in average. Can you please give me a example query which can cause overflow as you explained.?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22401
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22401: [SPARK-25413] Precision value is going for toss w...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22401#discussion_r216956864
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala ---
    @@ -36,7 +36,13 @@ abstract class AverageLike(child: Expression) extends DeclarativeAggregate {
       }
     
       private lazy val sumDataType = child.dataType match {
    -    case _ @ DecimalType.Fixed(p, s) => DecimalType.bounded(p + 10, s)
    +    /*
    +     * In case of sum of decimal ( assuming another decimal of same precision and scale)
    +     * Refer : org.apache.spark.sql.catalyst.analysis.DecimalPrecision
    +     * Precision : max(s1, s2) + max(p1 - s1, p2 - s2) + 1
    +     * Scale : max(s1, s2)
    +     */
    +    case _ @ DecimalType.Fixed(p, s) => DecimalType.adjustPrecisionScale(s + (p - s) + 1, s)
    --- End diff --
    
    No, this is what SQLServer does for the operation `+`, not for the `avg` result. There is a big difference between the intermediate result of avg and +, as here the + operation is executed once per each row (the exact number of times is not known in advance). 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org