You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/03/02 18:17:13 UTC

[GitHub] [arrow-rs] tustvold opened a new issue, #3794: Timezone Aware Timestamp Parsing

tustvold opened a new issue, #3794:
URL: https://github.com/apache/arrow-rs/issues/3794

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   `string_to_timestamp_nanos` contains logic to parse timestamp-like string to nanoseconds since UTC epoch.
   
   The semantics for this are well defined for timestamps including a timezone, e.g. `1997-01-31 09:26:56.123Z` or `1997-01-31T09:26:56.123-05:00`. However, the semantics get confused for timestamps of the form `1997-01-31T09:26:56.123`
   
   As pointed out by @MachaelLee on https://github.com/apache/arrow-rs/pull/3787 prior to https://github.com/apache/arrow-rs/pull/2814 timestamp string without a timezone would be interpreted as being in the system's local timezone, and this continues to be what the function docs state happens. This was changed in https://github.com/apache/arrow-rs/pull/2814 by @waitingkuo to instead be parsed in the UTC timezone.
   
   There are at least three "correct" behaviours when parsing strings without an embedded timezone `1997-01-31T09:26:56.123` depending on the context
   
   * When parsing a user-provided timestamp, use the system [`Local`](https://docs.rs/chrono/latest/chrono/offset/struct.Local.html) timezone and convert back to UTC **what it used to do**
   * When parsing a string to a datatype without a timezone assume UTC **what it currently does**
   * When parsing a string to a Timestamp column with a timezone, should assume the timestamp is in the given [Tz](https://docs.rs/arrow-array/latest/arrow_array/timezone/struct.Tz.html) and convert back to UTC
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   Provide a function with the signature
   
   ```
   pub fn string_to_datetime<T: Timezone>(t: &T, s: &str) -> Result<DateTime<T>>
   ```
   
   This could then be used with [Local](https://docs.rs/chrono/latest/chrono/offset/struct.Local.html), [Utc](https://docs.rs/chrono/latest/chrono/offset/struct.Utc.html), or [Tz](https://docs.rs/arrow-array/latest/arrow_array/timezone/struct.Tz.html) as appropriate.
   
   We could then update `string_to_timestamp_nanos` to be something like
   
   ```
   pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> {
       to_timestamp_nanos(string_to_datetime(Utc, s)?.naive_utc())
   }
   ```
   
   And possibly deprecate it as it has rather confusing semantics
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] stuartcarnie commented on issue #3794: Timezone Aware Timestamp Parsing

Posted by "stuartcarnie (via GitHub)" <gi...@apache.org>.
stuartcarnie commented on issue #3794:
URL: https://github.com/apache/arrow-rs/issues/3794#issuecomment-1452599948

   Am I correct this is an improvement to the API, and the semantics of interpreting a timestamp without an offset is still to interpret as UTC?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3794: Timezone Aware Timestamp Parsing

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3794:
URL: https://github.com/apache/arrow-rs/issues/3794#issuecomment-1452601335

   > the semantics of interpreting a timestamp without an offset is still to interpret as UTC
   
   Correct, there is a separate ticket to track interpreting with respect to the target timezone


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #3794: Timezone Aware Timestamp Parsing

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #3794: Timezone Aware Timestamp Parsing
URL: https://github.com/apache/arrow-rs/issues/3794


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #3794: Timezone Aware Timestamp Parsing

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #3794:
URL: https://github.com/apache/arrow-rs/issues/3794#issuecomment-1452377816

   > pub fn string_to_datetime<T: Timezone>(t: &T, s: &str) -> Result<DateTime<T>>
   
   Makes sense to me
   
   @waitingkuo  what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3794: Timezone Aware Timestamp Parsing

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3794:
URL: https://github.com/apache/arrow-rs/issues/3794#issuecomment-1463940324

   `label_issue.py` automatically added labels {'arrow'} from #3795


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #3794: Timezone Aware Timestamp Parsing

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #3794:
URL: https://github.com/apache/arrow-rs/issues/3794#issuecomment-1454139195

   Proposed implementation in https://github.com/apache/arrow-rs/pull/3795


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org