You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/22 03:06:19 UTC
[GitHub] [spark] wangyum opened a new pull request, #36628: [SPARK-39248][SQL] Improve divide performance for decimal type
wangyum opened a new pull request, #36628:
URL: https://github.com/apache/spark/pull/36628
### What changes were proposed in this pull request?
Switch decimal type divide from
```
toJavaBigDecimal.divide(that.toJavaBigDecimal, MATH_CONTEXT)
```
to
```
toJavaBigDecimal.divide(that.toJavaBigDecimal, DecimalType.MAX_SCALE, MATH_CONTEXT.getRoundingMode)
```
### Why are the changes needed?
Improve divide performance for decimal type.
Benchmark code:
```scala
import org.apache.spark.benchmark.Benchmark
val valuesPerIteration = 2880404L
val dir = "/tmp/spark/benchmark"
spark.range(valuesPerIteration).selectExpr("CAST(id AS DECIMAL(9, 2)) AS d").write.mode("Overwrite").parquet(dir)
val benchmark = new Benchmark("Benchmark decimal", valuesPerIteration, minNumIters = 5)
benchmark.addCase("d * 2 > 0") { _ =>
spark.read.parquet(dir).where("d * 2 > 0").write.format("noop").mode("Overwrite").save()
}
benchmark.addCase("d / 2 > 0") { _ =>
spark.read.parquet(dir).where("d / 2 > 0").write.format("noop").mode("Overwrite").save()
}
benchmark.run()
```
Before this PR:
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark decimal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
d * 2 > 0 480 585 141 6.0 166.7 1.0X
d / 2 > 0 4689 4920 243 0.6 1627.9 0.1X
```
After this PR:
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark decimal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
d * 2 > 0 529 580 35 5.4 183.6 1.0X
d / 2 > 0 811 916 80 3.6 281.4 0.7X
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36628:
URL: https://github.com/apache/spark/pull/36628#discussion_r880067844
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala:
##########
@@ -502,7 +502,8 @@ final class Decimal extends Ordered[Decimal] with Serializable {
Decimal(toJavaBigDecimal.multiply(that.toJavaBigDecimal, MATH_CONTEXT))
def / (that: Decimal): Decimal =
- if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal, MATH_CONTEXT))
+ if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal,
+ DecimalType.MAX_SCALE, MATH_CONTEXT.getRoundingMode))
Review Comment:
what does Hive use after 2.2?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #36628:
URL: https://github.com/apache/spark/pull/36628#issuecomment-1135948672
thanks, merging to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type
Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #36628:
URL: https://github.com/apache/spark/pull/36628#discussion_r880122085
##########
sql/core/src/test/resources/sql-tests/results/ansi/decimalArithmeticOperations.sql.out:
##########
@@ -142,6 +142,108 @@ struct<(12345678912345.123456789123 / 1.2345678E-8):decimal(38,9)>
1000000073899961059796.725866332
+-- !query
+select 1.0123456789012345678901234567890123456e36BD / 0.1
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkArithmeticException
+[CANNOT_CHANGE_DECIMAL_PRECISION] Decimal(expanded, 10123456789012345678901234567890123456.00000000000000000000000000000000000000, 76, 38) cannot be represented as Decimal(38, 6). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
+== SQL(line 1, position 7) ==
+select 1.0123456789012345678901234567890123456e36BD / 0.1
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+-- !query
+select 1.01234567890123456789012345678901234567e36BD / 0.1
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.parser.ParseException
+
+decimal can only support precision up to 38(line 1, pos 7)
+
+== SQL ==
+select 1.01234567890123456789012345678901234567e36BD / 0.1
Review Comment:
Removed it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type
URL: https://github.com/apache/spark/pull/36628
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36628:
URL: https://github.com/apache/spark/pull/36628#discussion_r880068159
##########
sql/core/src/test/resources/sql-tests/results/ansi/decimalArithmeticOperations.sql.out:
##########
@@ -142,6 +142,108 @@ struct<(12345678912345.123456789123 / 1.2345678E-8):decimal(38,9)>
1000000073899961059796.725866332
+-- !query
+select 1.0123456789012345678901234567890123456e36BD / 0.1
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkArithmeticException
+[CANNOT_CHANGE_DECIMAL_PRECISION] Decimal(expanded, 10123456789012345678901234567890123456.00000000000000000000000000000000000000, 76, 38) cannot be represented as Decimal(38, 6). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
+== SQL(line 1, position 7) ==
+select 1.0123456789012345678901234567890123456e36BD / 0.1
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+-- !query
+select 1.01234567890123456789012345678901234567e36BD / 0.1
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.parser.ParseException
+
+decimal can only support precision up to 38(line 1, pos 7)
+
+== SQL ==
+select 1.01234567890123456789012345678901234567e36BD / 0.1
Review Comment:
I don't think this test is useful. It fails at the parser side.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type
Posted by GitBox <gi...@apache.org>.
wangyum commented on PR #36628:
URL: https://github.com/apache/spark/pull/36628#issuecomment-1135393260
cc @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type
Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #36628:
URL: https://github.com/apache/spark/pull/36628#discussion_r880095078
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala:
##########
@@ -502,7 +502,8 @@ final class Decimal extends Ordered[Decimal] with Serializable {
Decimal(toJavaBigDecimal.multiply(that.toJavaBigDecimal, MATH_CONTEXT))
def / (that: Decimal): Decimal =
- if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal, MATH_CONTEXT))
+ if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal,
+ DecimalType.MAX_SCALE, MATH_CONTEXT.getRoundingMode))
Review Comment:
Implemented fast decimal since 2.2: https://issues.apache.org/jira/browse/HIVE-15335
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type
Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #36628:
URL: https://github.com/apache/spark/pull/36628#discussion_r878789366
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala:
##########
@@ -502,7 +502,8 @@ final class Decimal extends Ordered[Decimal] with Serializable {
Decimal(toJavaBigDecimal.multiply(that.toJavaBigDecimal, MATH_CONTEXT))
def / (that: Decimal): Decimal =
- if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal, MATH_CONTEXT))
+ if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal,
+ DecimalType.MAX_SCALE, MATH_CONTEXT.getRoundingMode))
Review Comment:
Hive also use this API before 2.2:
https://github.com/apache/hive/blob/rel/release-2.1.1/storage-api/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java#L233-L235
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org