You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "chenhao-db (via GitHub)" <gi...@apache.org> on 2023/03/15 01:20:59 UTC

[GitHub] [spark] chenhao-db opened a new pull request, #40429: [SPARK-42775][SQL] Throw exception when ApproximatePercentile result doesn't fit into output decimal type.

chenhao-db opened a new pull request, #40429:
URL: https://github.com/apache/spark/pull/40429

   ### What changes were proposed in this pull request?
   This PR fixed the counter-intuitive behaviors of the `ApproximatePercentile` expression mentioned in https://issues.apache.org/jira/browse/SPARK-42775. See the following *user-facing* changes for details.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. When working on decimals, this expression has the same output type as the input type. When the result that doesn't fit into output decimal type, it silently produces the result since the `Decimal` object is capable of representing decimals of arbitrary precision and scale. However, this can lead to weird behaviors because the value doesn't fit into its type. We should throw an exception immediately.
   
   Old results:
   
   ```sql
   -- ApproximatePercentile will first cast decimal value 9999999999999999999 into double, which results in 1E20, and cast 1E20 back into decimal, which doesn't fit the type decimal(19, 0).
   -- Here it is producing "NULL" because the value doesn't fit into its type, but it is actually not NULL.
   spark-sql> select approx_percentile(col, 0.5) from values (9999999999999999999) as tab(col);
   NULL
   spark-sql> select approx_percentile(col, 0.5) is null from values (9999999999999999999) as tab(col);
   false
   spark-sql> select cast(approx_percentile(col, 0.5) as string) from values (9999999999999999999) as tab(col);
   10000000000000000000
   ```
   
   New results:
   
   ```sql
   spark-sql> select approx_percentile(col, 0.5) from values (9999999999999999999) as tab(col);
   throws SparkArithmeticException
   ```
   
   ### How was this patch tested?
   
   Pass existing tests and some new tests.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42775][SQL] Throw exception when ApproximatePercentile result doesn't fit into output decimal type. [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #40429: [SPARK-42775][SQL] Throw exception when ApproximatePercentile result doesn't fit into output decimal type.
URL: https://github.com/apache/spark/pull/40429


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42775][SQL] Throw exception when ApproximatePercentile result doesn't fit into output decimal type. [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40429:
URL: https://github.com/apache/spark/pull/40429#issuecomment-1751521050

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] chenhao-db commented on pull request #40429: [SPARK-42775][SQL] Throw exception when ApproximatePercentile result doesn't fit into output decimal type.

Posted by "chenhao-db (via GitHub)" <gi...@apache.org>.
chenhao-db commented on PR #40429:
URL: https://github.com/apache/spark/pull/40429#issuecomment-1483544934

   Hi @LuciferYang, could you help me review this PR? Or do you know who would be more suitable to review it? Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #40429: [SPARK-42775][SQL] Throw exception when ApproximatePercentile result doesn't fit into output decimal type.

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #40429:
URL: https://github.com/apache/spark/pull/40429#discussion_r1158304312


##########
sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala:
##########
@@ -337,4 +338,19 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession
           Row(Period.ofMonths(200).normalized(), null, Duration.ofSeconds(200L)))
     }
   }
+
+  test("SPARK-42775: large decimal should overflow") {
+    withTempView(table) {
+      spark.sql("SELECT 9999999999999999999 as col").createOrReplaceTempView(table)
+      val ex = intercept[SparkException] {
+        checkAnswer(
+          spark.sql(s"SELECT percentile_approx(col, 0.5) FROM $table"),
+          Row(null)
+        )
+      }
+      assert(ex.getCause.isInstanceOf[SparkArithmeticException])
+      assert(ex.getCause.asInstanceOf[SparkArithmeticException].getErrorClass ==

Review Comment:
   @chenhao-db Could you use `checkError()` to check the error class, please.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org