You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "chenhao-db (via GitHub)" <gi...@apache.org> on 2023/03/03 06:51:02 UTC

[GitHub] [spark] chenhao-db opened a new pull request, #40264: [SPARK-42635][SQL][3.3] Fix the TimestampAdd expression

chenhao-db opened a new pull request, #40264:
URL: https://github.com/apache/spark/pull/40264

   This is a backport of #40237.
   
   ### What changes were proposed in this pull request?
   This PR fixed the counter-intuitive behaviors of the `TimestampAdd` expression mentioned in https://issues.apache.org/jira/browse/SPARK-42635. See the following *user-facing* changes for details.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. This PR fixes the three problems mentioned in SPARK-42635:
   
   1. When the time is close to daylight saving time transition, the result may be discontinuous and not monotonic.
   2. Adding month, quarter, and year silently ignores `Int` overflow during unit conversion.
   3. Adding sub-month units (week, day, hour, minute, second, millisecond, microsecond)silently ignores `Long` overflow during unit conversion.
   
   Some examples of the result changes:
   
   Old results:
   
   ```
   // In America/Los_Angeles timezone:
   timestampadd(DAY, 1, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 (this is correct, put it here for comparison)
   timestampadd(HOUR, 23, 2011-03-12 03:00:00) = 2011-03-13 03:00:00
   timestampadd(HOUR, 24, 2011-03-12 03:00:00) = 2011-03-13 03:00:00
   timestampadd(SECOND, 86400 - 1, 2011-03-12 03:00:00) = 2011-03-13 03:59:59
   timestampadd(SECOND, 86400, 2011-03-12 03:00:00) = 2011-03-13 03:00:00
   // In UTC timezone:
   timestampadd(quarter, 1431655764, 1970-01-01 00:00:00) = 1969-09-01 00:00:00
   timestampadd(day, 106751992, 1970-01-01 00:00:00) = -290308-12-22 15:58:10.448384
   ```
   
   New results:
   
   ```
   // In America/Los_Angeles timezone:
   timestampadd(DAY, 1, 2011-03-12 03:00:00) = 2011-03-13 03:00:00
   timestampadd(HOUR, 23, 2011-03-12 03:00:00) = 2011-03-13 03:00:00
   timestampadd(HOUR, 24, 2011-03-12 03:00:00) = 2011-03-13 04:00:00
   timestampadd(SECOND, 86400 - 1, 2011-03-12 03:00:00) = 2011-03-13 03:59:59
   timestampadd(SECOND, 86400, 2011-03-12 03:00:00) = 2011-03-13 04:00:00
   // In UTC timezone:
   timestampadd(quarter, 1431655764, 1970-01-01 00:00:00) = throw overflow exception
   timestampadd(day, 106751992, 1970-01-01 00:00:00) = throw overflow exception
   ```
   
   
   ### How was this patch tested?
   
   Pass existing tests and some new tests.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #40264: [SPARK-42635][SQL][3.3] Fix the TimestampAdd expression

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk closed pull request #40264: [SPARK-42635][SQL][3.3] Fix the TimestampAdd expression
URL: https://github.com/apache/spark/pull/40264


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] chenhao-db commented on pull request #40264: [SPARK-42635][SQL][3.3] Fix the TimestampAdd expression

Posted by "chenhao-db (via GitHub)" <gi...@apache.org>.
chenhao-db commented on PR #40264:
URL: https://github.com/apache/spark/pull/40264#issuecomment-1454660710

   @MaxGekk I see. In old versions Spark doesn't include the error class in the error message: https://github.com/apache/spark/blob/branch-3.3/core/src/main/scala/org/apache/spark/ErrorInfo.scala#L74. I just removed the error class prefix in the expected error message.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #40264: [SPARK-42635][SQL][3.3] Fix the TimestampAdd expression

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #40264:
URL: https://github.com/apache/spark/pull/40264#issuecomment-1454657486

   Seems like the test failure is related to the changes:
   ```
   [info] - SPARK-42635: timestampadd unit conversion overflow *** FAILED *** (12 milliseconds)
   [info]   (non-codegen mode) Expected error message is `[DATETIME_OVERFLOW] Datetime operation overflow`, but `Datetime operation overflow: add 106751992 DAY to TIMESTAMP '1970-01-01 00:00:00'.` found (ExpressionEvalHelper.scala:176)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] chenhao-db commented on pull request #40264: [SPARK-42635][SQL][3.3] Fix the TimestampAdd expression

Posted by "chenhao-db (via GitHub)" <gi...@apache.org>.
chenhao-db commented on PR #40264:
URL: https://github.com/apache/spark/pull/40264#issuecomment-1453062944

   @MaxGekk Please take a look, thanks for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #40264: [SPARK-42635][SQL][3.3] Fix the TimestampAdd expression

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #40264:
URL: https://github.com/apache/spark/pull/40264#issuecomment-1454846526

   +1, LGTM. All GAs passed. Merging to 3.3.
   Thank you, @chenhao-db.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #40264: [SPARK-42635][SQL][3.3] Fix the TimestampAdd expression

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #40264:
URL: https://github.com/apache/spark/pull/40264#issuecomment-1453222106

   @chenhao-db Could you fix the build errors:
   ```
   [error] /home/runner/work/apache-spark/apache-spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala:2047:7: not found: value checkErrorInExpression
   [error]       checkErrorInExpression[SparkArithmeticException](TimestampAdd("DAY",
   [error]       ^
   [error] /home/runner/work/apache-spark/apache-
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] chenhao-db commented on pull request #40264: [SPARK-42635][SQL][3.3] Fix the TimestampAdd expression

Posted by "chenhao-db (via GitHub)" <gi...@apache.org>.
chenhao-db commented on PR #40264:
URL: https://github.com/apache/spark/pull/40264#issuecomment-1453847644

   @MaxGekk It seems that `checkErrorInExpression` doesn't exist in 3.3, so I still have to use the old `checkExceptionInExpression`. Is that okay?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org