You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "paleolimbot (via GitHub)" <gi...@apache.org> on 2023/05/18 13:15:40 UTC

[GitHub] [arrow] paleolimbot opened a new pull request, #35671: GH-35633: [R] R builds failing with error 'Invalid: Timestamps already have a timezone: 'UTC'. Cannot localize to 'UTC''

paleolimbot opened a new pull request, #35671:
URL: https://github.com/apache/arrow/pull/35671

   Alternative approach to #35634.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic merged pull request #35671: GH-35633: [R] R builds failing with error 'Invalid: Timestamps already have a timezone: 'UTC'. Cannot localize to 'UTC''

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic merged PR #35671:
URL: https://github.com/apache/arrow/pull/35671


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #35671: GH-35633: [R] R builds failing with error 'Invalid: Timestamps already have a timezone: 'UTC'. Cannot localize to 'UTC''

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #35671:
URL: https://github.com/apache/arrow/pull/35671#discussion_r1204056042


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -61,12 +61,14 @@ register_bindings_datetime_utility <- function() {
         tz <- Sys.timezone()
       }
 
-      # if a timestamp does not contain timezone information (i.e. it is
+      # If a timestamp does not contain timezone information (i.e. it is
       # "timezone-naive") we can attach timezone information (i.e. convert it into
       # a "timezone-aware" timestamp) with `assume_timezone`
       # if we want to cast to a different timezone, we can only do it for
-      # timezone-aware timestamps, not for timezone-naive ones
-      if (!is.null(tz)) {
+      # timezone-aware timestamps, not for timezone-naive ones.
+      # strptime in Acero will return a timezone-aware timestamp if %z is
+      # part of the format string.
+      if (!is.null(tz) && !grepl("%z", format, fixed = TRUE)) {

Review Comment:
   I don't know if this helps with understanding dates/times in R, but the `tz` argument won't affect the underlying point in time (just the attribute of the output and, to great confusion of many, how it is printed):
   
   ``` r
   unclass(as.POSIXct(strptime("2020-05-01T00:00+0100", format="%Y-%m-%dT%H:%M%z")))
   #> [1] 1588287600
   #> attr(,"tzone")
   #> [1] ""
   unclass(as.POSIXct(strptime("2020-05-01T00:00+0100", tz = "UTC", format="%Y-%m-%dT%H:%M%z")))
   #> [1] 1588287600
   #> attr(,"tzone")
   #> [1] "UTC"
   unclass(as.POSIXct(strptime("2020-05-01T00:00+0100", tz = "America/Halifax", format="%Y-%m-%dT%H:%M%z")))
   #> [1] 1588287600
   #> attr(,"tzone")
   #> [1] "America/Halifax"
   ```
   
   <sup>Created on 2023-05-24 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35671: GH-35633: [R] R builds failing with error 'Invalid: Timestamps already have a timezone: 'UTC'. Cannot localize to 'UTC''

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35671:
URL: https://github.com/apache/arrow/pull/35671#issuecomment-1553040938

   * Closes: #35633


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35671: GH-35633: [R] R builds failing with error 'Invalid: Timestamps already have a timezone: 'UTC'. Cannot localize to 'UTC''

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35671:
URL: https://github.com/apache/arrow/pull/35671#issuecomment-1556069860

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d78ef390289b4d15a97b41ab6bbda307...c86c7534b5b645bc8e4e34450dd32b96/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #35671: GH-35633: [R] R builds failing with error 'Invalid: Timestamps already have a timezone: 'UTC'. Cannot localize to 'UTC''

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #35671:
URL: https://github.com/apache/arrow/pull/35671#discussion_r1205376906


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -61,12 +61,14 @@ register_bindings_datetime_utility <- function() {
         tz <- Sys.timezone()
       }
 
-      # if a timestamp does not contain timezone information (i.e. it is
+      # If a timestamp does not contain timezone information (i.e. it is
       # "timezone-naive") we can attach timezone information (i.e. convert it into
       # a "timezone-aware" timestamp) with `assume_timezone`
       # if we want to cast to a different timezone, we can only do it for
-      # timezone-aware timestamps, not for timezone-naive ones
-      if (!is.null(tz)) {
+      # timezone-aware timestamps, not for timezone-naive ones.
+      # strptime in Acero will return a timezone-aware timestamp if %z is
+      # part of the format string.
+      if (!is.null(tz) && !grepl("%z", format, fixed = TRUE)) {

Review Comment:
   > It's a good point that this doesn't match what base R does...I think we'd want to `cast()` if `grepl("%z")` and `assume_timezone()` otherwise?
   
   Yes, that sounds correct
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35671: GH-35633: [R] R builds failing with error 'Invalid: Timestamps already have a timezone: 'UTC'. Cannot localize to 'UTC''

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35671:
URL: https://github.com/apache/arrow/pull/35671#issuecomment-1556069444

   Benchmark runs are scheduled for baseline = dd33cabdc40fc28f85e6f80bbf572f8ddfeae9ef and contender = 9039ee2c0082b74a2dcfd443f9d136aa62d30531. 9039ee2c0082b74a2dcfd443f9d136aa62d30531 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/8a727439ae634a64a45d3fcba5629508...4c67d8d58650431da01027864a4e3ca7/)
   [Finished :arrow_down:0.21% :arrow_up:0.03%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/35fc929c69664716a1228eedfcc238f3...22ce5e5c1c134955be4a1182af9a950e/)
   [Finished :arrow_down:2.29% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d78ef390289b4d15a97b41ab6bbda307...c86c7534b5b645bc8e4e34450dd32b96/)
   [Finished :arrow_down:0.12% :arrow_up:0.24%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/991679fb50a1429aaffdcc46c9b2e84e...528aaaa134594b51812e2db3bcbcaa18/)
   Buildkite builds:
   [Finished] [`9039ee2c` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2897)
   [Finished] [`9039ee2c` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2933)
   [Finished] [`9039ee2c` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2898)
   [Finished] [`9039ee2c` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2923)
   [Finished] [`dd33cabd` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2896)
   [Finished] [`dd33cabd` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2932)
   [Finished] [`dd33cabd` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2897)
   [Finished] [`dd33cabd` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2922)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #35671: GH-35633: [R] R builds failing with error 'Invalid: Timestamps already have a timezone: 'UTC'. Cannot localize to 'UTC''

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #35671:
URL: https://github.com/apache/arrow/pull/35671#discussion_r1203809343


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -61,12 +61,14 @@ register_bindings_datetime_utility <- function() {
         tz <- Sys.timezone()
       }
 
-      # if a timestamp does not contain timezone information (i.e. it is
+      # If a timestamp does not contain timezone information (i.e. it is
       # "timezone-naive") we can attach timezone information (i.e. convert it into
       # a "timezone-aware" timestamp) with `assume_timezone`
       # if we want to cast to a different timezone, we can only do it for
-      # timezone-aware timestamps, not for timezone-naive ones
-      if (!is.null(tz)) {
+      # timezone-aware timestamps, not for timezone-naive ones.
+      # strptime in Acero will return a timezone-aware timestamp if %z is
+      # part of the format string.
+      if (!is.null(tz) && !grepl("%z", format, fixed = TRUE)) {

Review Comment:
   I don't understand how timezones work in R (and with POSIXlt and POSIXct), but with the above change, the `tz` keyword will just be ignored when the string includes an offset? 
   
   That doesn't match fully what base R does:
   
   ```
   > strptime("2020-05-01T00:00+0100", format="%Y-%m-%dT%H:%M%z")
   [1] "2020-05-01 01:00:00"
   > strptime("2020-05-01T00:00+0100", format="%Y-%m-%dT%H:%M%z", tz="US/Eastern")
   [1] "2020-04-30 19:00:00"
   ```
   
   (but again, I don't really know how to interpret those returned values, and whether they are tz "aware" or "native" or even if those concepts even exist in R. I looked at the `$zone` and `$gmtoff` attributes of the above return values, but can't make sense of it)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #35671: GH-35633: [R] R builds failing with error 'Invalid: Timestamps already have a timezone: 'UTC'. Cannot localize to 'UTC''

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #35671:
URL: https://github.com/apache/arrow/pull/35671#discussion_r1204045935


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -61,12 +61,14 @@ register_bindings_datetime_utility <- function() {
         tz <- Sys.timezone()
       }
 
-      # if a timestamp does not contain timezone information (i.e. it is
+      # If a timestamp does not contain timezone information (i.e. it is
       # "timezone-naive") we can attach timezone information (i.e. convert it into
       # a "timezone-aware" timestamp) with `assume_timezone`
       # if we want to cast to a different timezone, we can only do it for
-      # timezone-aware timestamps, not for timezone-naive ones
-      if (!is.null(tz)) {
+      # timezone-aware timestamps, not for timezone-naive ones.
+      # strptime in Acero will return a timezone-aware timestamp if %z is
+      # part of the format string.
+      if (!is.null(tz) && !grepl("%z", format, fixed = TRUE)) {

Review Comment:
   We don't have a naive datetime type in R...everything is timezone aware, and `tz = ""` means "your local timezone" (but everything is UTC internally). What you've encountered here is a fun type called the `POSIXlt` which is sort of like a data frame that stores components in separate vectors (the `POSIXct` is a more normal version of it, which is seconds from the unix epoch as a double vector).
   
   It's a good point that this doesn't match what base R does...I think we'd want to `cast()` if `grepl("%z")` and `assume_timezone()` otherwise?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org