You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/22 03:06:19 UTC

[GitHub] [spark] wangyum opened a new pull request, #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

wangyum opened a new pull request, #36628:
URL: https://github.com/apache/spark/pull/36628

   ### What changes were proposed in this pull request?
   
   Switch decimal type divide from
   ```
   toJavaBigDecimal.divide(that.toJavaBigDecimal, MATH_CONTEXT)
   ```
   to
   ```
   toJavaBigDecimal.divide(that.toJavaBigDecimal, DecimalType.MAX_SCALE, MATH_CONTEXT.getRoundingMode)
   ```
   
   ### Why are the changes needed?
   
   Improve divide performance for decimal type.
   
   Benchmark code:
   ```scala
   import org.apache.spark.benchmark.Benchmark
   
   val valuesPerIteration = 2880404L
   val dir = "/tmp/spark/benchmark"
   spark.range(valuesPerIteration).selectExpr("CAST(id AS DECIMAL(9, 2)) AS d").write.mode("Overwrite").parquet(dir)
   
   val benchmark = new Benchmark("Benchmark decimal", valuesPerIteration, minNumIters = 5)
   benchmark.addCase("d * 2 > 0") { _ =>
     spark.read.parquet(dir).where("d * 2 > 0").write.format("noop").mode("Overwrite").save()
   }
   
   benchmark.addCase("d / 2 > 0") { _ =>
     spark.read.parquet(dir).where("d / 2 > 0").write.format("noop").mode("Overwrite").save()
   }
   benchmark.run()
   ```
   
   Before this PR:
   ```
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7
   Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
   Benchmark decimal:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   ------------------------------------------------------------------------------------------------------------------------
   d * 2 > 0                                           480            585         141          6.0         166.7       1.0X
   d / 2 > 0                                          4689           4920         243          0.6        1627.9       0.1X
   ```
   
   After this PR:
   ```
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7
   Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
   Benchmark decimal:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   ------------------------------------------------------------------------------------------------------------------------
   d * 2 > 0                                           529            580          35          5.4         183.6       1.0X
   d / 2 > 0                                           811            916          80          3.6         281.4       0.7X
   ```
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36628:
URL: https://github.com/apache/spark/pull/36628#discussion_r880067844


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala:
##########
@@ -502,7 +502,8 @@ final class Decimal extends Ordered[Decimal] with Serializable {
     Decimal(toJavaBigDecimal.multiply(that.toJavaBigDecimal, MATH_CONTEXT))
 
   def / (that: Decimal): Decimal =
-    if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal, MATH_CONTEXT))
+    if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal,
+      DecimalType.MAX_SCALE, MATH_CONTEXT.getRoundingMode))

Review Comment:
   what does Hive use after 2.2?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #36628:
URL: https://github.com/apache/spark/pull/36628#issuecomment-1135948672

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #36628:
URL: https://github.com/apache/spark/pull/36628#discussion_r880122085


##########
sql/core/src/test/resources/sql-tests/results/ansi/decimalArithmeticOperations.sql.out:
##########
@@ -142,6 +142,108 @@ struct<(12345678912345.123456789123 / 1.2345678E-8):decimal(38,9)>
 1000000073899961059796.725866332
 
 
+-- !query
+select 1.0123456789012345678901234567890123456e36BD / 0.1
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkArithmeticException
+[CANNOT_CHANGE_DECIMAL_PRECISION] Decimal(expanded, 10123456789012345678901234567890123456.00000000000000000000000000000000000000, 76, 38) cannot be represented as Decimal(38, 6). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
+== SQL(line 1, position 7) ==
+select 1.0123456789012345678901234567890123456e36BD / 0.1
+       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+-- !query
+select 1.01234567890123456789012345678901234567e36BD / 0.1
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.parser.ParseException
+
+decimal can only support precision up to 38(line 1, pos 7)
+
+== SQL ==
+select 1.01234567890123456789012345678901234567e36BD / 0.1

Review Comment:
   Removed it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type
URL: https://github.com/apache/spark/pull/36628


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36628:
URL: https://github.com/apache/spark/pull/36628#discussion_r880068159


##########
sql/core/src/test/resources/sql-tests/results/ansi/decimalArithmeticOperations.sql.out:
##########
@@ -142,6 +142,108 @@ struct<(12345678912345.123456789123 / 1.2345678E-8):decimal(38,9)>
 1000000073899961059796.725866332
 
 
+-- !query
+select 1.0123456789012345678901234567890123456e36BD / 0.1
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkArithmeticException
+[CANNOT_CHANGE_DECIMAL_PRECISION] Decimal(expanded, 10123456789012345678901234567890123456.00000000000000000000000000000000000000, 76, 38) cannot be represented as Decimal(38, 6). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
+== SQL(line 1, position 7) ==
+select 1.0123456789012345678901234567890123456e36BD / 0.1
+       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+-- !query
+select 1.01234567890123456789012345678901234567e36BD / 0.1
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.parser.ParseException
+
+decimal can only support precision up to 38(line 1, pos 7)
+
+== SQL ==
+select 1.01234567890123456789012345678901234567e36BD / 0.1

Review Comment:
   I don't think this test is useful. It fails at the parser side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

Posted by GitBox <gi...@apache.org>.
wangyum commented on PR #36628:
URL: https://github.com/apache/spark/pull/36628#issuecomment-1135393260

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #36628:
URL: https://github.com/apache/spark/pull/36628#discussion_r880095078


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala:
##########
@@ -502,7 +502,8 @@ final class Decimal extends Ordered[Decimal] with Serializable {
     Decimal(toJavaBigDecimal.multiply(that.toJavaBigDecimal, MATH_CONTEXT))
 
   def / (that: Decimal): Decimal =
-    if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal, MATH_CONTEXT))
+    if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal,
+      DecimalType.MAX_SCALE, MATH_CONTEXT.getRoundingMode))

Review Comment:
   Implemented fast decimal since 2.2: https://issues.apache.org/jira/browse/HIVE-15335



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #36628:
URL: https://github.com/apache/spark/pull/36628#discussion_r878789366


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala:
##########
@@ -502,7 +502,8 @@ final class Decimal extends Ordered[Decimal] with Serializable {
     Decimal(toJavaBigDecimal.multiply(that.toJavaBigDecimal, MATH_CONTEXT))
 
   def / (that: Decimal): Decimal =
-    if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal, MATH_CONTEXT))
+    if (that.isZero) null else Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal,
+      DecimalType.MAX_SCALE, MATH_CONTEXT.getRoundingMode))

Review Comment:
   Hive also use this API before 2.2:
   https://github.com/apache/hive/blob/rel/release-2.1.1/storage-api/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java#L233-L235



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org