You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/30 16:35:37 UTC

[GitHub] [spark] SelfImpr001 opened a new pull request, #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…

SelfImpr001 opened a new pull request, #37732:
URL: https://github.com/apache/spark/pull/37732

   ### What changes were proposed in this pull request?
       Solve the problem that when spark2.4.3 relies on hive1.2.1, after writing a specific value of 0.00, the query is abnormal
   
   ### Why are the changes needed?
       This problem will lose the precision of 0.00, there is a problem of inaccurate data, and a program blocking exception will occur when querying
   
   
   
   ### Does this PR introduce _any_ user-facing change?
     create table testgg as select 0.00 as gg;
     select * from testgg;
   
     Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 1 kind SECONDARY position: 0 length: 0 range: 0 offset: 0 limit: 0
   
       The performance of abnormal data in hdfs
       In the display of orc data on HDFS, it can be seen that among the abnormally written data, in the stripes data of orc, the length of column1 is 0
   ```
     Rows: 1Compression: SNAPPYCompression size: 262144
     Type: struct<gg:decimal(2,2)>
     Stripe Statistics: 
     Stripe 1:   Column 0: count: 0 hasNull: false   
     Column 1: count: 0 hasNull: true
     File Statistics: Column 0: count: 0 hasNull: false Column 1: count: 0 hasNull: true
     Stripes: Stripe: offset: 3 data: 5 rows: 1 tail: 64 index: 35   
     Stream: column 0 section ROW_INDEX start: 3 length 11   
     Stream: column 1 section ROW_INDEX start: 14 length 24   
     Stream: column 1 section PRESENT start: 38 length 5   
     Stream: column 1 section DATA start: 43 length 0   
     Stream: column 1 section SECONDARY start: 43 length 0   
     Encoding column 0: DIRECT   
     Encoding column 1: DIRECT_V2File length: 213 bytesPadding length: 0 bytesPadding ratio: 0%
   ```
   ### How was this patch tested?
   ```
   spark-sql (default)> 
                      > 
                      > create table testgg as select 0.00 as gg;select * from testgg;
   2022-08-30 17:13:04,014 INFO  (main) [Logging.scala:logInfo(54)] - Parsing command: create table testgg as select 0.00 as gg
   2022-08-30 17:13:08,662 INFO  (main) [Logging.scala:logInfo(54)] - Parsing command: `default`.`testgg`
   22/08/30 17:13:08 INFO SparkSqlParser: Parsing command: `default`.`testgg`
   22/08/30 17:13:08 INFO CatalystSqlParser: Parsing command: decimal(3,2)
   22/08/30 17:13:09 INFO YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 
   22/08/30 17:13:09 INFO DAGScheduler: ResultStage 1 (processCmd at CliDriver.java:376) finished in 0.408 s
   22/08/30 17:13:09 INFO DAGScheduler: Job 1 finished: processCmd at CliDriver.java:376, took 0.413011 s
   gg
   0
   Time taken: 0.767 seconds, Fetched 1 row(s)
   22/08/30 17:13:09 INFO SparkSQLCLIDriver: Time taken: 0.767 seconds, Fetched 1 row(s)
   ```
   After the repair, the correct result is obtained. At this time, the hdfs orc file information is as follows:
   ```
   Rows: 1Compression: SNAPPY
   Compression size: 262144
   Type: struct<gg:decimal(3,2)>
   Stripe Statistics: Stripe 1:   Column 0: count: 0 hasNull: false   
   Column 1: count: 1 hasNull: false min: 1 max: 1 sum: 1
   File Statistics: Column 0: count: 0 hasNull: false 
   Column 1: count: 1 hasNull: false min: 1 max: 1 sum: 1
   Stripes: Stripe: offset: 3 data: 10 rows: 1 tail: 58 index: 40   
   Stream: column 0 section ROW_INDEX start: 3 length 11   
   Stream: column 1 section ROW_INDEX start: 14 length 29   
   Stream: column 1 section DATA start: 43 length 4   
   Stream: column 1 section SECONDARY start: 47 length 6   
   Encoding column 0: DIRECT   Encoding column 1: DIRECT_V2File length: 229 bytesPadding length: 0 bytesPadding ratio: 0%
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #37732:
URL: https://github.com/apache/spark/pull/37732#discussion_r962452438


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##########
@@ -76,8 +76,19 @@ object Literal {
       val decimal = Decimal(d)
       Literal(decimal, DecimalType.fromDecimal(decimal))
     case d: JavaBigDecimal =>
-      val decimal = Decimal(d)
-      Literal(decimal, DecimalType.fromDecimal(decimal))
+      Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale()))
+      if (d.abs().compareTo(new JavaBigDecimal("0")) == 0) {

Review Comment:
   @SelfImpr001 Please upgrade your Spark version to the latest Spark version. 2.4.8 is the last release and no more 2.4.x releases should be expected even for bug fixes. Please see: https://spark.apache.org/versioning-policy.html
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SelfImpr001 commented on a diff in pull request #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…

Posted by GitBox <gi...@apache.org>.
SelfImpr001 commented on code in PR #37732:
URL: https://github.com/apache/spark/pull/37732#discussion_r962252317


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##########
@@ -76,8 +76,19 @@ object Literal {
       val decimal = Decimal(d)
       Literal(decimal, DecimalType.fromDecimal(decimal))
     case d: JavaBigDecimal =>
-      val decimal = Decimal(d)
-      Literal(decimal, DecimalType.fromDecimal(decimal))
+      Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale()))
+      if (d.abs().compareTo(new JavaBigDecimal("0")) == 0) {

Review Comment:
   Yes, it is found that it only exists in Spark 2.4 + Hive 1.x, but there are still many people using 2.4 +, it is recommended to provide a patch solution



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37732:
URL: https://github.com/apache/spark/pull/37732#issuecomment-1233284400

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…

Posted by GitBox <gi...@apache.org>.
srowen commented on code in PR #37732:
URL: https://github.com/apache/spark/pull/37732#discussion_r959815532


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##########
@@ -76,8 +76,19 @@ object Literal {
       val decimal = Decimal(d)
       Literal(decimal, DecimalType.fromDecimal(decimal))
     case d: JavaBigDecimal =>
-      val decimal = Decimal(d)
-      Literal(decimal, DecimalType.fromDecimal(decimal))
+      Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale()))
+      if (d.abs().compareTo(new JavaBigDecimal("0")) == 0) {

Review Comment:
   JavaBigDecimal.ZERO
   But wait is this only a problem for Spark 2.4 + Hive 1.x? that's EOL now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…

Posted by GitBox <gi...@apache.org>.
srowen commented on code in PR #37732:
URL: https://github.com/apache/spark/pull/37732#discussion_r962314494


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##########
@@ -76,8 +76,19 @@ object Literal {
       val decimal = Decimal(d)
       Literal(decimal, DecimalType.fromDecimal(decimal))
     case d: JavaBigDecimal =>
-      val decimal = Decimal(d)
-      Literal(decimal, DecimalType.fromDecimal(decimal))
+      Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale()))
+      if (d.abs().compareTo(new JavaBigDecimal("0")) == 0) {

Review Comment:
   OK, this is targeted at master though, and there will be no more 2.4 releases



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…

Posted by GitBox <gi...@apache.org>.
srowen commented on code in PR #37732:
URL: https://github.com/apache/spark/pull/37732#discussion_r967060213


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##########
@@ -76,8 +76,19 @@ object Literal {
       val decimal = Decimal(d)
       Literal(decimal, DecimalType.fromDecimal(decimal))
     case d: JavaBigDecimal =>
-      val decimal = Decimal(d)
-      Literal(decimal, DecimalType.fromDecimal(decimal))
+      Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale()))
+      if (d.abs().compareTo(new JavaBigDecimal("0")) == 0) {

Review Comment:
   We're saying the project no longer makes 2.4.x releases at all, so it wouldn't help anyone on 2.4



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SelfImpr001 commented on a diff in pull request #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…

Posted by GitBox <gi...@apache.org>.
SelfImpr001 commented on code in PR #37732:
URL: https://github.com/apache/spark/pull/37732#discussion_r967056734


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##########
@@ -76,8 +76,19 @@ object Literal {
       val decimal = Decimal(d)
       Literal(decimal, DecimalType.fromDecimal(decimal))
     case d: JavaBigDecimal =>
-      val decimal = Decimal(d)
-      Literal(decimal, DecimalType.fromDecimal(decimal))
+      Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale()))
+      if (d.abs().compareTo(new JavaBigDecimal("0")) == 0) {

Review Comment:
   @srowen @wangyum 
   Thank you for your answer, because there are many developers whose environment is still in 2.4.x, and upgrading to the latest version requires a lot of verification. It is difficult to upgrade in a short time. In order to avoid the influence of the old version on the developer, it is recommended to merge the PR to the 2.4.X version, so that Doing this will not affect the master branch. If it cannot be merged due to version management specifications, just close this PR directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum closed pull request #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…

Posted by GitBox <gi...@apache.org>.
wangyum closed pull request #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…
URL: https://github.com/apache/spark/pull/37732


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SelfImpr001 commented on a diff in pull request #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…

Posted by GitBox <gi...@apache.org>.
SelfImpr001 commented on code in PR #37732:
URL: https://github.com/apache/spark/pull/37732#discussion_r969072089


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##########
@@ -76,8 +76,19 @@ object Literal {
       val decimal = Decimal(d)
       Literal(decimal, DecimalType.fromDecimal(decimal))
     case d: JavaBigDecimal =>
-      val decimal = Decimal(d)
-      Literal(decimal, DecimalType.fromDecimal(decimal))
+      Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale()))
+      if (d.abs().compareTo(new JavaBigDecimal("0")) == 0) {

Review Comment:
   
   Ok, thank you for your answers, I will close the PR now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org