You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/03 20:55:23 UTC

[GitHub] [arrow-rs] alamb commented on a diff in pull request #3795: Timezone aware timestamp parsing (#3794)

alamb commented on code in PR #3795:
URL: https://github.com/apache/arrow-rs/pull/3795#discussion_r1124995968


##########
arrow-cast/src/parse.rs:
##########
@@ -112,34 +80,44 @@ pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> {
     // without a timezone specifier as a local time, using T as a separator
     // Example: 2020-09-08T13:42:29.190855
     if let Ok(ts) = NaiveDateTime::parse_from_str(s, "%Y-%m-%dT%H:%M:%S%.f") {
-        return to_timestamp_nanos(ts);
+        if let Some(offset) = timezone.offset_from_local_datetime(&ts).single() {

Review Comment:
   : https://docs.rs/chrono-tz/0.8.1/chrono_tz/enum.Tz.html#method.offset_from_local_datetime



##########
arrow-cast/src/parse.rs:
##########
@@ -153,6 +131,42 @@ pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> {
     )))
 }
 
+/// Accepts a string in RFC3339 / ISO8601 standard format and some
+/// variants and converts it to a nanosecond precision timestamp.
+///
+/// See [`string_to_datetime`] for the full set of supported formats
+///
+/// Implements the `to_timestamp` function to convert a string to a
+/// timestamp, following the model of spark SQL’s to_`timestamp`.
+///
+/// Internally, this function uses the `chrono` library for the
+/// datetime parsing
+///
+/// We hope to extend this function in the future with a second
+/// parameter to specifying the format string.
+///
+/// ## Timestamp Precision
+///
+/// Function uses the maximum precision timestamps supported by
+/// Arrow (nanoseconds stored as a 64-bit integer) timestamps. This
+/// means the range of dates that timestamps can represent is ~1677 AD
+/// to 2262 AM
+///
+/// ## Timezone / Offset Handling
+///
+/// Numerical values of timestamps are stored compared to offset UTC.
+///
+/// This function interprets string without an explicit time zone timestamps

Review Comment:
   ```suggestion
   /// This function interprets string without an explicit time zone as timestamps
   ```



##########
arrow-cast/src/parse.rs:
##########
@@ -153,6 +131,42 @@ pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> {
     )))
 }
 
+/// Accepts a string in RFC3339 / ISO8601 standard format and some
+/// variants and converts it to a nanosecond precision timestamp.
+///
+/// See [`string_to_datetime`] for the full set of supported formats
+///
+/// Implements the `to_timestamp` function to convert a string to a
+/// timestamp, following the model of spark SQL’s to_`timestamp`.
+///
+/// Internally, this function uses the `chrono` library for the
+/// datetime parsing
+///
+/// We hope to extend this function in the future with a second
+/// parameter to specifying the format string.
+///
+/// ## Timestamp Precision
+///
+/// Function uses the maximum precision timestamps supported by
+/// Arrow (nanoseconds stored as a 64-bit integer) timestamps. This
+/// means the range of dates that timestamps can represent is ~1677 AD
+/// to 2262 AM
+///
+/// ## Timezone / Offset Handling
+///
+/// Numerical values of timestamps are stored compared to offset UTC.
+///
+/// This function interprets string without an explicit time zone timestamps
+/// relative to UTC, see [`string_to_datetime`] for alternative semantics
+///
+/// For example, both `1997-01-31 09:26:56.123Z`, `1997-01-31T09:26:56.123`,
+/// and `1997-01-31T14:26:56.123-05:00` will be parsed as the same value
+///

Review Comment:
   
   I think some context got lost -- I only think this statement is true if the system timezone is set to UTC - 5:
   
   ```suggestion
   /// For example, `1997-01-31 09:26:56.123Z`, `1997-01-31T09:26:56.123`,
   /// and `1997-01-31T14:26:56.123-05:00` will be parsed as the same value
   /// if the system timezone is set to Americas/New_York (UTC-5).
   ```
   



##########
arrow-cast/src/parse.rs:
##########
@@ -76,12 +42,14 @@ use chrono::prelude::*;
 ///     "2023-01-01 040506 +07:30:00",
 ///     "2023-01-01 04:05:06.789 PST",
 ///     "2023-01-01 04:05:06.789 -08",
-#[inline]
-pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> {
+pub fn string_to_datetime<T: TimeZone>(

Review Comment:
   I don't fully understand this question 
   
   I would think you could call `string_to_datetime<Utc>(...)` which would give you a chrono `DateTime<Utc>` and you can then then do whatever you want with the result (e.g. convert it into a nanosecond timestamp, etc)
   



##########
arrow-cast/src/parse.rs:
##########
@@ -614,6 +629,34 @@ mod tests {
             naive_datetime.timestamp_nanos(),
             parse_timestamp("2020-09-08 13:42:29").unwrap()
         );
+
+        let tz: Tz = "+02:00".parse().unwrap();
+        let date = string_to_datetime(&tz, "2020-09-08 13:42:29").unwrap();
+        let utc = date.naive_utc().to_string();
+        assert_eq!(utc, "2020-09-08 11:42:29");
+        let local = date.naive_local().to_string();
+        assert_eq!(local, "2020-09-08 13:42:29");
+
+        let date = string_to_datetime(&tz, "2020-09-08 13:42:29Z").unwrap();
+        let utc = date.naive_utc().to_string();
+        assert_eq!(utc, "2020-09-08 13:42:29");
+        let local = date.naive_local().to_string();
+        assert_eq!(local, "2020-09-08 15:42:29");
+
+        let dt =
+            NaiveDateTime::parse_from_str("2020-09-08T13:42:29Z", "%Y-%m-%dT%H:%M:%SZ")
+                .unwrap();
+        let local: Tz = "+08:00".parse().unwrap();

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org