You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/02/28 02:03:44 UTC

[GitHub] [spark] beliefer opened a new pull request, #40204: [SPARK-42601][SQL] New physical type Decimal128 for DecimalType

beliefer opened a new pull request, #40204:
URL: https://github.com/apache/spark/pull/40204

   ### What changes were proposed in this pull request?
   Introduce a new physical type `Decimal128` for `DecimalType`.
   
   
   ### Why are the changes needed?
   Spark SQL today supports the Decimal data type. The implementation of Spark Decimal holds a BigDecimal or Long value. Spark Decimal provides some operators like +, -, *, /, % and so on. These operators rely heavily on the computational power of BigDecimal or Long itself. For ease of understanding, take the + as an example. The implementation shows below.
   ```
     def + (that: Decimal): Decimal = {
       if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == that.scale) {
         Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 1, scale)
       } else {
         Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal))
       }
     }
   ```
   We can see the + of Long will be called if Spark Decimal holds a Long value. Otherwise,  the add of BigDecimal will be called if Spark Decimal holds a BigDecimal value. The other operators of Spark Decimal adopt the similar way. Furthermore, the code shown above calls Decimal.apply to construct a new instance of Spark Decimal.  As we know, the add operator of BigDecimal constructed a new instance of BigDecimal. So, if we call the + operator of Spark Decimal who holds a Long value, Spark will construct a new instance of Decimal. Otherwise, Spark will construct a new instance of BigDecimal and a new instance of Decimal.
   Through rough analysis, we know:
   1. The computational power of Spark Decimal may depend on BigDecimal.
   2. The calculation operators of Spark Decimal create a lot of new instances of Decimal and may create a lot of new instances of BigDecimal.
   If a large table has a field called 'colA whose type is Spark Decimal, the execution of SUM('colA) will involve the creation of a large number of Spark Decimal instances and BigDecimal instances. These Spark Decimal instances and BigDecimal instances will lead to garbage collection frequently.
   Int128 is a high-performance data type about 1X~6X more efficient than BigDecimal for typical operations. It uses a finite (128 bit) precision and can handle up to decimal(38, X). The implementation of Int128 just uses two Long values to represent the high and low bits of 128 bits respectively. Int128 is lighter than BigDecimal, reduces the cost of new() and garbage collection.
   If Spark could support a new type to represent the decimal type with Int128, it would improve the performance for calculation operators.
   Now, let's call the new decimal type Decimal128.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   New feature.
   
   
   ### How was this patch tested?
   New test cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42601][SQL] New physical type Decimal128 for DecimalType [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #40204: [SPARK-42601][SQL] New physical type Decimal128 for DecimalType
URL: https://github.com/apache/spark/pull/40204


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40204: [SPARK-42601][SQL] New physical type Decimal128 for DecimalType

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40204:
URL: https://github.com/apache/spark/pull/40204#issuecomment-1582152357

   > Do you mind pointing me in the code when `Long` is used by `Decimal` instead of `BigDecimal`? Thanks
   
   Because the performance of `BigDecimal` is bad.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40204: [SPARK-42601][SQL] New physical type Decimal128 for DecimalType

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40204:
URL: https://github.com/apache/spark/pull/40204#issuecomment-1534265498

   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42601][SQL] New physical type Decimal128 for DecimalType [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40204:
URL: https://github.com/apache/spark/pull/40204#issuecomment-1806588440

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Tagar commented on pull request #40204: [SPARK-42601][SQL] New physical type Decimal128 for DecimalType

Posted by "Tagar (via GitHub)" <gi...@apache.org>.
Tagar commented on PR #40204:
URL: https://github.com/apache/spark/pull/40204#issuecomment-1582014032

   > The implementation of Spark Decimal holds a BigDecimal or Long value
   
   Do you mind pointing me in the code when `Long` is used by `Decimal` instead of `BigDecimal`? Thanks 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42601][SQL] New physical type Decimal128 for DecimalType [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40204:
URL: https://github.com/apache/spark/pull/40204#issuecomment-1953304347

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org