You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "viirya (via GitHub)" <gi...@apache.org> on 2023/11/07 05:53:18 UTC

[PR] [SPARK-45816][SQL] Return null when overflowing during casting from timestamp to integers [spark]

viirya opened a new pull request, #43694:
URL: https://github.com/apache/spark/pull/43694

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Spark cast works in two modes: ansi and non-ansi. When overflowing during casting, the common behavior under non-ansi mode is to return null. However, casting from Timestamp to Int/Short/Byte returns a wrapping value now. The behavior to silently overflow doesn't make sense. This patch changes it to the common behavior, i.e., returning null.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   Returning a wrapping value, e.g., negative one, during casting Timestamp to Int/Short/Byte could implicitly cause misinterpret casted result without caution. We also should follow the common behavior of overflowing handling.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   Yes. Overflowing during casting from Timestamp to Int/Short/Byte under non-ansi mode, returns null instead of wrapping value.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   Will add test or update test if any existing ones fail
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43694:
URL: https://github.com/apache/spark/pull/43694#issuecomment-1800279966

   Yes, we can ignore that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return null when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1384416801


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   While I'm working on related stuff, I take a look how Spark handles casting timestamp to integer. If no special reason, this behavior doesn't look correct to me and is not common way for overflowing when casting other types under non-ansi mode.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43694:
URL: https://github.com/apache/spark/pull/43694#issuecomment-1801794704

   The GA failure is unrelated.
   Merged to master. @viirya Thank you.
   Thank you @dongjoon-hyun @cloud-fan @MaxGekk too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1384782328


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 4.0, any read of SQL tables takes into consideration the SQL configs `spark.sql.files.ignoreCorruptFiles`/`spark.sql.files.ignoreMissingFiles` instead of the core config `spark.files.ignoreCorruptFiles`/`spark.files.ignoreMissingFiles`.
 - Since Spark 4.0, `spark.sql.hive.metastore` drops the support of Hive prior to 2.0.0 as they require JDK 8 that Spark does not support anymore. Users should migrate to higher versions.
 - Since Spark 4.0, `spark.sql.parquet.compression.codec` drops the support of codec name `lz4raw`, please use `lz4_raw` instead.
+- Since Spark 4.0, when overflowing during casting timestamp to byte/short/int under non-ansi mode, Spark will return null instead a wrapping value.

Review Comment:
   for numeric arithmetics, we actually have 3 behaviors:
   1. the default: return overflowed value
   2. ansi mode: throw overflow error
   3. `try_cast`: return null if overflows
   
   Shall we do the same thing for timestamp?



##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 4.0, any read of SQL tables takes into consideration the SQL configs `spark.sql.files.ignoreCorruptFiles`/`spark.sql.files.ignoreMissingFiles` instead of the core config `spark.files.ignoreCorruptFiles`/`spark.files.ignoreMissingFiles`.
 - Since Spark 4.0, `spark.sql.hive.metastore` drops the support of Hive prior to 2.0.0 as they require JDK 8 that Spark does not support anymore. Users should migrate to higher versions.
 - Since Spark 4.0, `spark.sql.parquet.compression.codec` drops the support of codec name `lz4raw`, please use `lz4_raw` instead.
+- Since Spark 4.0, when overflowing during casting timestamp to byte/short/int under non-ansi mode, Spark will return null instead a wrapping value.

Review Comment:
   for numeric cast, we actually have 3 behaviors:
   1. the default: return overflowed value
   2. ansi mode: throw overflow error
   3. `try_cast`: return null if overflows
   
   Shall we do the same thing for timestamp?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on PR #43694:
URL: https://github.com/apache/spark/pull/43694#issuecomment-1800278638

   The failed test in `SparkConnectServiceE2ESuite` looks like unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1385910298


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   Oracle.
   `SELECT CAST('2' AS int) FROM DUAL;` output 2.
   SELECT CAST(TO_TIMESTAMP('1900-05-05 18:34:56.10', 'YYYY-MM-DD HH24:MI:SS.FF') AS int) FROM DUAL; output `SQL 错误 [932] [42000]: ORA-00932: 数据类型不一致: 应为 NUMBER, 但却获得 TIMESTAMP`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   Oracle.
   `SELECT CAST('2' AS int) FROM DUAL;` output 2.
   SELECT CAST(TO_TIMESTAMP('1900-05-05 18:34:56.10', 'YYYY-MM-DD HH24:MI:SS.FF') AS int) FROM DUAL; output `SQL 错误 [932] [42000]: ORA-00932: 数据类型不一致: 应为 NUMBER, 但却获得 TIMESTAMP`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1384455469


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   As far as I can check, it is current behavior at the beginning. So that's why I wonder if I miss anything. But indeed, it might be rare to cast timestamp to byte/short/int. Maybe it is why this is not caught? 🤔 



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   As far as I can check, it is current behavior at the beginning. So that's why I wonder if I miss anything. But indeed, it might be rare to cast timestamp to byte/short/int. Maybe it is why this is not caught? 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1423111720


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -1658,22 +1661,26 @@ case class Cast(
       integralType: String,
       from: DataType,
       to: DataType): CastFunction = {
-    if (ansiEnabled) {
-      val longValue = ctx.freshName("longValue")
-      val fromDt = ctx.addReferenceObj("from", from, from.getClass.getName)
-      val toDt = ctx.addReferenceObj("to", to, to.getClass.getName)
-      (c, evPrim, _) =>
-        code"""
+
+    val longValue = ctx.freshName("longValue")
+    val fromDt = ctx.addReferenceObj("from", from, from.getClass.getName)
+    val toDt = ctx.addReferenceObj("to", to, to.getClass.getName)
+
+    (c, evPrim, evNull) =>
+      val overflow = if (ansiEnabled) {

Review Comment:
   It's not a big deal, just found that we skipped overflow handling in the non-codegen path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1385902467


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   It seems we should throw error even if without ansi mode.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1385901987


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   I checked Postgres.
   `select CAST('32' as int);` output 32.
   `select CAST(TIMESTAMP '1900-05-05 18:34:56.1' as int);` output `SQL error [42846]: ERROR: cannot cast type timestamp without time zone to integer`
   `select CAST(TIMESTAMP WITH TIME ZONE '1900-05-05 18:34:56.1' as int);` output `SQL error [42846]: ERROR: cannot cast type timestamp with time zone to integer`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1385910298


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   Oracle.
   `SELECT CAST('2' AS int) FROM DUAL;` output 2.
   `SELECT CAST(TO_TIMESTAMP('1900-05-05 18:34:56.10', 'YYYY-MM-DD HH24:MI:SS.FF') AS int) FROM DUAL;` output `SQL 错误 [932] [42000]: ORA-00932: 数据类型不一致: 应为 NUMBER, 但却获得 TIMESTAMP`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1385902467


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   It seems we should throw error even if without ansi mode.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1385390260


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 4.0, any read of SQL tables takes into consideration the SQL configs `spark.sql.files.ignoreCorruptFiles`/`spark.sql.files.ignoreMissingFiles` instead of the core config `spark.files.ignoreCorruptFiles`/`spark.files.ignoreMissingFiles`.
 - Since Spark 4.0, `spark.sql.hive.metastore` drops the support of Hive prior to 2.0.0 as they require JDK 8 that Spark does not support anymore. Users should migrate to higher versions.
 - Since Spark 4.0, `spark.sql.parquet.compression.codec` drops the support of codec name `lz4raw`, please use `lz4_raw` instead.
+- Since Spark 4.0, when overflowing during casting timestamp to byte/short/int under non-ansi mode, Spark will return null instead a wrapping value.

Review Comment:
   Hmm, I think `TRY` mode currently already return null (because it is treated like ansi mode). So we can ignore it.
   
   Though I remember non-ansi mode (`LEGACY` mode) returns null for overflowing that follows Hive.
   
   This is LEGACY mode's document:
   
   > LEGACY: the default evaluation mode, which is compliant to Hive SQL.
   
   I verified Hive's behavior (4.0.0-beta-1, `hive.strict.timestamp.conversion`: false) now.
   
   I use the same test case from `CastSuiteBase`:
   
   ```
   0: jdbc:hive2://localhost:10000/> select cast(cast("1900-05-05 18:34:56.1" as timestamp) as int);
   +-------+
   |  _c0  |
   +-------+
   | NULL  |
   +-------+
   1 row selected (0.109 seconds)
   ```
   
   ```
   0: jdbc:hive2://localhost:10000/> select cast(cast("1900-05-05 18:34:56.1" as timestamp) as bigint);
   +--------------+
   |     _c0      |
   +--------------+
   | -2198208304  |
   +--------------+
   1 row selected (0.126 seconds)
   ```
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1386095254


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -823,17 +826,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1.toShort else 0.toShort)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toShort) {
           longValue.toShort
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, ShortType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, ShortType)
+          } else {
+            null
+          }

Review Comment:
   Okay



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -1658,24 +1665,29 @@ case class Cast(
       integralType: String,
       from: DataType,
       to: DataType): CastFunction = {
-    if (ansiEnabled) {
-      val longValue = ctx.freshName("longValue")
-      val fromDt = ctx.addReferenceObj("from", from, from.getClass.getName)
-      val toDt = ctx.addReferenceObj("to", to, to.getClass.getName)
-      (c, evPrim, _) =>
-        code"""
+
+    val longValue = ctx.freshName("longValue")
+    val fromDt = ctx.addReferenceObj("from", from, from.getClass.getName)
+    val toDt = ctx.addReferenceObj("to", to, to.getClass.getName)
+
+    (c, evPrim, evNull) =>
+      val overflow = if (ansiEnabled) {
+        code"""throw QueryExecutionErrors.castingCauseOverflowError($c, $fromDt, $toDt);"""
+      } else {
+        code"$evNull = true;"
+      }
+
+      code"""
           long $longValue = ${timestampToLongCode(c)};
           if ($longValue == ($integralType) $longValue) {
             $evPrim = ($integralType) $longValue;
           } else {
-            throw QueryExecutionErrors.castingCauseOverflowError($c, $fromDt, $toDt);
+            $overflow
           }
         """
-    } else {
-      (c, evPrim, _) => code"$evPrim = ($integralType) ${timestampToLongCode(c)};"
-    }
   }
 
+

Review Comment:
   Okay



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1384455469


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   As far as I can check, it is current behavior at the beginning (at least in 8 yrs ago commits it is already this behavior). So that's why I wonder if I miss anything. But indeed, it might be rare to cast timestamp to byte/short/int. Maybe it is why this is not caught? 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1385908680


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 4.0, any read of SQL tables takes into consideration the SQL configs `spark.sql.files.ignoreCorruptFiles`/`spark.sql.files.ignoreMissingFiles` instead of the core config `spark.files.ignoreCorruptFiles`/`spark.files.ignoreMissingFiles`.
 - Since Spark 4.0, `spark.sql.hive.metastore` drops the support of Hive prior to 2.0.0 as they require JDK 8 that Spark does not support anymore. Users should migrate to higher versions.
 - Since Spark 4.0, `spark.sql.parquet.compression.codec` drops the support of codec name `lz4raw`, please use `lz4_raw` instead.
+- Since Spark 4.0, when overflowing during casting timestamp to byte/short/int under non-ansi mode, Spark will return null instead a wrapping value.

Review Comment:
   oh I see, makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer closed pull request #43694: [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers
URL: https://github.com/apache/spark/pull/43694


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return null when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1384416801


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   While I'm working on related stuff about casting timestamp, I take a look how Spark handles casting timestamp to integer. If no special reason, this behavior doesn't look correct to me and is not common way for overflowing when casting other types under non-ansi mode.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return null when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1384424672


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   Let me know if I miss anything here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1385914250


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -823,17 +826,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1.toShort else 0.toShort)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toShort) {
           longValue.toShort
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, ShortType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, ShortType)
+          } else {
+            null
+          }

Review Comment:
   Shall we extract the logic as a common method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return null when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1384417670


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala:
##########
@@ -514,9 +514,9 @@ class CastWithAnsiOffSuite extends CastSuiteBase {
       val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1")
       assert(negativeTs.getTime < 0)
       val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND)
-      checkEvaluation(cast(negativeTs, ByteType), expectedSecs.toByte)
-      checkEvaluation(cast(negativeTs, ShortType), expectedSecs.toShort)
-      checkEvaluation(cast(negativeTs, IntegerType), expectedSecs.toInt)

Review Comment:
   For example, the long value is -2198208304 (i.e., `negativeTs`) but int value is 2096758992.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return null when overflowing during casting from timestamp to integers [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1384449877


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   Do you know when the current behavior was introduced, @viirya ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on PR #43694:
URL: https://github.com/apache/spark/pull/43694#issuecomment-1802381694

   Thank you @dongjoon-hyun @cloud-fan @beliefer @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1423059585


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -1658,22 +1661,26 @@ case class Cast(
       integralType: String,
       from: DataType,
       to: DataType): CastFunction = {
-    if (ansiEnabled) {
-      val longValue = ctx.freshName("longValue")
-      val fromDt = ctx.addReferenceObj("from", from, from.getClass.getName)
-      val toDt = ctx.addReferenceObj("to", to, to.getClass.getName)
-      (c, evPrim, _) =>
-        code"""
+
+    val longValue = ctx.freshName("longValue")
+    val fromDt = ctx.addReferenceObj("from", from, from.getClass.getName)
+    val toDt = ctx.addReferenceObj("to", to, to.getClass.getName)
+
+    (c, evPrim, evNull) =>
+      val overflow = if (ansiEnabled) {

Review Comment:
   Yea (although we didn't do it before this PR too), do you want me to create a follow up?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1384458242


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -785,17 +786,19 @@ case class Cast(
       buildCast[Boolean](_, b => if (b) 1 else 0)
     case DateType =>
       buildCast[Int](_, d => null)
-    case TimestampType if ansiEnabled =>
+    case TimestampType =>
       buildCast[Long](_, t => {
         val longValue = timestampToLong(t)
         if (longValue == longValue.toInt) {
           longValue.toInt
         } else {
-          throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          if (ansiEnabled) {
+            throw QueryExecutionErrors.castingCauseOverflowError(t, from, IntegerType)
+          } else {
+            null
+          }

Review Comment:
   Got it. Thank you for checking.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return null when overflowing during casting from timestamp to integers [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1384417670


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala:
##########
@@ -514,9 +514,9 @@ class CastWithAnsiOffSuite extends CastSuiteBase {
       val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1")
       assert(negativeTs.getTime < 0)
       val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND)
-      checkEvaluation(cast(negativeTs, ByteType), expectedSecs.toByte)
-      checkEvaluation(cast(negativeTs, ShortType), expectedSecs.toShort)
-      checkEvaluation(cast(negativeTs, IntegerType), expectedSecs.toInt)

Review Comment:
   For example, the long value is -2198208304 but int value is 2096758992.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1386236065


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -774,6 +775,14 @@ case class Cast(
       buildCast[Int](_, i => yearMonthIntervalToInt(i, x.startField, x.endField).toLong)
   }
 
+  private def errorOrNull(t: Any, from: DataType, to: DataType) = {

Review Comment:
   Yes, nice indeed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43694:
URL: https://github.com/apache/spark/pull/43694#issuecomment-1802191402

   Thank you so much, @viirya , @beliefer , @cloud-fan , @MaxGekk .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1385911831


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -1658,24 +1665,29 @@ case class Cast(
       integralType: String,
       from: DataType,
       to: DataType): CastFunction = {
-    if (ansiEnabled) {
-      val longValue = ctx.freshName("longValue")
-      val fromDt = ctx.addReferenceObj("from", from, from.getClass.getName)
-      val toDt = ctx.addReferenceObj("to", to, to.getClass.getName)
-      (c, evPrim, _) =>
-        code"""
+
+    val longValue = ctx.freshName("longValue")
+    val fromDt = ctx.addReferenceObj("from", from, from.getClass.getName)
+    val toDt = ctx.addReferenceObj("to", to, to.getClass.getName)
+
+    (c, evPrim, evNull) =>
+      val overflow = if (ansiEnabled) {
+        code"""throw QueryExecutionErrors.castingCauseOverflowError($c, $fromDt, $toDt);"""
+      } else {
+        code"$evNull = true;"
+      }
+
+      code"""
           long $longValue = ${timestampToLongCode(c)};
           if ($longValue == ($integralType) $longValue) {
             $evPrim = ($integralType) $longValue;
           } else {
-            throw QueryExecutionErrors.castingCauseOverflowError($c, $fromDt, $toDt);
+            $overflow
           }
         """
-    } else {
-      (c, evPrim, _) => code"$evPrim = ($integralType) ${timestampToLongCode(c)};"
-    }
   }
 
+

Review Comment:
   Please revert this line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45816][SQL] Return `NULL` when overflowing during casting from timestamp to integers [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43694:
URL: https://github.com/apache/spark/pull/43694#discussion_r1423032246


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -1658,22 +1661,26 @@ case class Cast(
       integralType: String,
       from: DataType,
       to: DataType): CastFunction = {
-    if (ansiEnabled) {
-      val longValue = ctx.freshName("longValue")
-      val fromDt = ctx.addReferenceObj("from", from, from.getClass.getName)
-      val toDt = ctx.addReferenceObj("to", to, to.getClass.getName)
-      (c, evPrim, _) =>
-        code"""
+
+    val longValue = ctx.freshName("longValue")
+    val fromDt = ctx.addReferenceObj("from", from, from.getClass.getName)
+    val toDt = ctx.addReferenceObj("to", to, to.getClass.getName)
+
+    (c, evPrim, evNull) =>
+      val overflow = if (ansiEnabled) {

Review Comment:
   I think we can skip the overflow handling if the `to` data type is LongType?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org