You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/04 11:25:58 UTC

[GitHub] [arrow-datafusion] waitingkuo opened a new issue, #4106: Support `SET` timezone to non-UTC time zone

waitingkuo opened a new issue, #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for this feature, in addition to  the *what*)
   
   I'd like to add the support for `SET TIMEZONE`.
   I'd like to have `now()` and `TimestampTz` use the timezone in config_options instead of the fixed UTC
   
   **Describe the solution you'd like**
   A clear and concise description of what you want to happen.
   
   1. enable `SET TIME TO [SOME_TIMEZONE]`
   It's currently disabled on purpose
   
   https://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/sql/src/planner.rs#L2484-L2489
   
   i'd like to remove it (probably replaced by some tz verification) and have this as the result
   ```bash
   ❯ set time zone to '+08:00';
   0 rows in set. Query took 0.000 seconds.
   ❯ show time zone;
   +--------------------------------+---------+
   | name                           | setting |
   +--------------------------------+---------+
   | datafusion.execution.time_zone | +08:00  |
   +--------------------------------+---------+
   1 row in set. Query took 0.007 seconds.
   ```
   
   
   2. `now()` returns the `Timestamp<TimeUnit::Nanosecond, Some("SOME_TIMEZONE")` according to the time zone we have. it's currently fixed as "UTC"
   https://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/physical-expr/src/datetime_expressions.rs#L176-L186
   
   `Some("UTC".to_owned())` should be modified. we need to pass `config_options` or `time zone` into here this function
   
   e.g.
   current version
   ```bash
   ❯ set time zone to '+08:00';
   0 rows in set. Query took 0.000 seconds.
   ❯ select arrow_typeof(now());
   +------------------------------------+
   | arrowtypeof(now())                 |
   +------------------------------------+
   | Timestamp(Nanosecond, Some("UTC")) |
   +------------------------------------+
   1 row in set. Query took 0.003 seconds.
   ```
   becomes
   ```bash
   ❯ set time zone to '+08:00';
   0 rows in set. Query took 0.000 seconds.
   ❯ select arrow_typeof(now());
   +---------------------------------------+
   | arrowtypeof(now())                    |
   +---------------------------------------+
   | Timestamp(Nanosecond, Some("+08:00")) |
   +---------------------------------------+
   1 row in set. Query took 0.003 seconds.
   ```
   
   3. `TimestampTz` should use the timezone from `config_options`
   
   https://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/sql/src/planner.rs#L2832-L2841
   
   we currently fixed `TimestampTz` as "UTC" without considering `config_options`'s timezone.
   to enable it to consider `config_options`, we need to
   
   3-1. make `convert_simple_data_type` as a method of `SqlToRel`
   
   https://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/sql/src/planner.rs#L2811-L2817
   
   3-2. Add `get_config_option` in `ContextProvider` so that we could get the time zone
   
   https://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/sql/src/planner.rs#L2832-L2841
   
   3-3 then we can use the timezone in `config_options` here to replace fixed UTC
   https://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/sql/src/planner.rs#L2832-L2841
   
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1307403724

   > chrono-tz use
   
   Very informative, ty!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
alamb closed issue #4106: Support `SET` timezone to non-UTC time zone
URL: https://github.com/apache/arrow-datafusion/issues/4106


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1307407980

   > perhaps we could force user to add the timezone offset
   
   That would definitely eliminate the ambiguity. I'd love to see if others have opinions about this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1303316518

   i added a POC to showcase some functionalities 
   #4107 
   
   perhaps we should break it to several pc later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1307081076

   @avantgardnerio 
   
   perhaps we could force user to add the timezone offset while casting string to timestamptz
   
   e.g. 
   correct
   `2000-01-01T02:00:00+00:00`
   `2000-01-01T02:00:00-06:00`
   `2000-01-01T02:00:00-07:00`
   `2000-01-01T02:00:00Z` (this is a special case)
   
   note that if we set time zone as +08:00
   `2000-01-01T02:00:00-06:00`  becomes`2000-01-01T16:00:00+08:00`
   
   incorrect
   `2000-03-13T02:30:00` -> this is the one might be illegal in some timezone
   `2000-03-01T02:30:00` -> this should work for all the timezone but still disallow it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1304498803

   > @waitingkuo thank you for your clear communication as always! You have the most reviewable PRs and issues I've seen 😄 .
   
   ❤️ 
   
   > 
   > The issue looks good to me, but I would propose tackling in stages:
   > 
   > 1. Allow things like `set time zone to '+08:00';` as described
   > 2. Allow `set time zone to 'US/Mountain';` or `set time zone to 'US/Denver';` or `set time zone to 'MT';`
   > 
   > This is due to the extra complication of what to do when someone does a `select TO_TIMESTAMP('2022-11-06T01:30:00');` and the server timezone is 'US/Mountain' and there are two distinct 1:30AMs one in MDT and one in MST. Do we know what postgres does here? The only sensible thing I can think to do is fail the query with an error.
   
   PostgreSQL's `to_timestamp` ouputs `timestamp with time zone`
   https://www.postgresql.org/docs/current/functions-formatting.html
   ```bash
   to_timestamp ( text, text ) → timestamp with time zone
   Converts string to time stamp according to the given format. 
   
   to_timestamp('05 Dec 2000', 'DD Mon YYYY') → 2000-12-05 00:00:00-05
   ```
   
   ```bash
   willy=# set timezone to 'America/Denver';
   SET
   willy=# select to_timestamp('05 Dec 2000', 'DD Mon YYYY');
         to_timestamp      
   ------------------------
    2000-12-05 00:00:00-07
   (1 row)
   ```
   it shows the fixed offset timezone. i think there's no ambiguity in this approach.
   
   The `TO_TIMESTAMP` for datafusion for now outputs `Timestamp<TimeUnit, None>`. 
   https://github.com/apache/arrow-datafusion/blob/6d00bd990ce5644181ad1549a6c70c8406219070/datafusion/expr/src/function.rs#L206-L220
   I think it will make more sense if these `TO_TIMESTSMP_XX` functions change the signatures to Timestamp with Timezone. This is in my backlog as well  . I was working on #2979 to align `TO_TIMESTAMP` to Postgrseql's but found that we don't have timezone support. So I moved the focus to finish the time zone support (from sqlparser, arrow-rs, and finally datafusion 😃 )
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1303293143

   @alamb  @liukun4515  @avantgardnerio @tustvold  would you mind helping to check whether this make sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1304872474

   > @waitingkuo thank you for your clear communication as always! You have the most reviewable PRs and issues I've seen 😄 .
   > 
   > The issue looks good to me, but I would propose tackling in stages:
   > 
   > 1. Allow things like `set time zone to '+08:00';` as described
   > 2. Allow `set time zone to 'US/Mountain';` or `set time zone to 'US/Denver';` or `set time zone to 'MT';`
   > 
   > This is due to the extra complication of what to do when someone does a `select TO_TIMESTAMP('2022-11-06T01:30:00');` and the server timezone is 'US/Mountain' and there are two distinct 1:30AMs one in MDT and one in MST. Do we know what postgres does here? The only sensible thing I can think to do is fail the query with an error.
   
   i'll limit this pr to support fixed offset timezone for now. thank you @avantgardnerio 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1303861810

   > You have the most reviewable PRs and issues I've seen 😄 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1304499949

   i think we might need to refactor the `return_type` to support dynamical time zone determine
   https://github.com/apache/arrow-datafusion/blob/6d00bd990ce5644181ad1549a6c70c8406219070/datafusion/expr/src/function.rs#L221-L223
   
   `now()`'s return type is `TImestamp<TimeUnt, Some("+08:00")>` for now. need to get the timezone from config_options to and return `TImestamp<TimeUnt, Some("SOME_TIMEZONE")>` instead. I'll prefer move the `now()` support to another issue/pr. 
   
   i.e. 2. now() returns the Timestamp<TimeUnit::Nanosecond, Some("SOME_TIMEZONE") according to the time zone we have. it's currently fixed as "UTC"
   will be implemented in this pr. we can do this instead for now
   ```bash
   ❯ set time zone to '+08:00';
   0 rows in set. Query took 0.000 seconds.
   
   ❯ select now();
   +----------------------------------+
   | now()                            |
   +----------------------------------+
   | 2022-11-05T11:11:51.713660+00:00 |
   +----------------------------------+
   1 row in set. Query took 0.003 seconds.
   
   ❯ select now()::timestamptz;
   +----------------------------------+
   | now()                            |
   +----------------------------------+
   | 2022-11-05T19:11:14.245835+08:00 |
   +----------------------------------+
   1 row in set. Query took 0.003 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1304563978

   > no ambiguity in this approach
   
   I disagree. The problem is that `America/Denver` is on a statuary (legally constructed) timezone called [Mountain Time](https://time.is/MT). Because Mountain Time (presently) observes Daylight Savings Time, it is two "real" timezones: [MDT](https://time.is/MDT) or UTC-6 and [MST](https://time.is/MST) UTC-7.
   
   ![image](https://user-images.githubusercontent.com/3855243/200126237-740b3d3d-9b37-4748-816f-89d8e371ffb5.png)
   
   Unfortunately, this Sunday (2022-11-06), MT will be switching from UTC-6 to UTC-7 at 02:00 MDT, which will cause the clocks to go to 01:00 MST, and about half an hour later it will be 1:30AM MT for the second time this year.
   
   So while I agree there is no ambiguity for `to_timestamp('05 Dec 2000', 'DD Mon YYYY')`, I think we need a test case specifically for `select TO_TIMESTAMP('2022-11-06T01:30:00');`, because given only that the only information known is:
   
   1. `2022-11-06T01:30:00`
   2. and `America/Denver`
   
   There is no way to infer if this means there is no way to know if that corresponds with `1667745000` (UTC-6) or `1667748600` (UTC-7).
   
   ![image](https://user-images.githubusercontent.com/3855243/200126506-ce557d10-4b63-43f4-a9a1-ba7eb8656e74.png)
   
   https://www.unixtimestamp.com/index.php
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1304872305

   @avantgardnerio thank you. I didn't aware this before. never live in the area that has the timezone switch. 
   
   I did some research in Postgrseql and Chrono-tz
   
   MST timezone offset: -7
   MDT timezone offset: -6
   
   In 2022, MDT
   began from  Sunday, 13 March, 02:00 (changed from -7 to -6)
   ended at  Sunday, 6 November, 02:00 (changed from -6 to -7)
   
   ```bash
   willy=# set timezone to 'America/Denver';
   SET
   ```
   this is valid (right before the timezone shift):
   ```bash
   willy=# select to_timestamp('2022-03-13 01:00', 'YYYY-MM-DD HH24:MI');
         to_timestamp      
   ------------------------
    2022-03-13 01:00:00-07
   (1 row)
   
   willy=# select to_timestamp('2022-03-13 01:59', 'YYYY-MM-DD HH24:MI');
         to_timestamp      
   ------------------------
    2022-03-13 01:59:00-07
   (1 row)
   ```
   
   begin from 2:00, it switches to MDT (-6)
   i think the logic behind for this hour is:
   parse as it's MST(-7), and then switch to MDT (-6), that's why we have an one hour shift here
   ```bash
   willy=# select to_timestamp('2022-03-13 02:00', 'YYYY-MM-DD HH24:MI');
         to_timestamp      
   ------------------------
    2022-03-13 03:00:00-06
   (1 row)
   
   willy=# select to_timestamp('2022-03-13 02:30', 'YYYY-MM-DD HH24:MI');
         to_timestamp      
   ------------------------
    2022-03-13 03:30:00-06
   (1 row)
   
   willy=# select to_timestamp('2022-03-13 02:59', 'YYYY-MM-DD HH24:MI');
         to_timestamp      
   ------------------------
    2022-03-13 03:59:00-06
   (1 row)
   ```
   
   and then the next hour it's parsed as MDT
   ```bash
   willy=# select to_timestamp('2022-03-13 03:00', 'YYYY-MM-DD HH24:MI');
         to_timestamp      
   ------------------------
    2022-03-13 03:00:00-06
   (1 row)
   ```
   
   
   In November 6th 2am MDT, time zone is switched back to MST (1am MST)
   
   let's begin with something unambiguous
   ```
   willy=# select to_timestamp('2022-11-06 00:59', 'YYYY-MM-DD HH24:MI');
         to_timestamp      
   ------------------------
    2022-11-06 00:59:00-06
   (1 row)
   ```
   
   for the next 1 hour, it's ambiguous since both MST and MDT has 1 am. and this is what postgresql does, it uses MST even though 1am MDT is valid as well
   ```bash
   willy=# select to_timestamp('2022-11-06 01:00', 'YYYY-MM-DD HH24:MI');
         to_timestamp      
   ------------------------
    2022-11-06 01:00:00-07
   (1 row)
   willy=# select to_timestamp('2022-11-06 01:59', 'YYYY-MM-DD HH24:MI');
         to_timestamp      
   ------------------------
    2022-11-06 01:59:00-07
   (1 row)
   ```
   
   after that, there's no ambiguity
   ```bash
   willy=# select to_timestamp('2022-11-06 02:00', 'YYYY-MM-DD HH24:MI');
         to_timestamp      
   ------------------------
    2022-11-06 02:00:00-07
   (1 row)
   ```
   
   conclusion for Postgrseql: when it's ambiguous or invalid, it's parsed as MST, and then switches to MDT if needed
   
   
   now let's see the behavior for `chrono-tz`
   ```rust
        let tz: Tz = "America/Denver".parse().unwrap();
       let dt = tz.datetime_from_str("2022-03-13T01:59", "%Y-%m-%dT%H:%M");
       println!("2022-03-13T01:59 -> {:?} / {}", dt, dt.unwrap().to_rfc3339());
       let dt = tz.datetime_from_str("2022-03-13T02:00", "%Y-%m-%dT%H:%M");
       println!("2022-03-13T02:00 -> {:?}", dt);
       let dt = tz.datetime_from_str("2022-03-13T02:59", "%Y-%m-%dT%H:%M");
       println!("2022-03-13T02:59 -> {:?}", dt);
       let dt = tz.datetime_from_str("2022-03-13T03:00", "%Y-%m-%dT%H:%M");
       println!("2022-03-13T03:00 -> {:?} / {}", dt, dt.unwrap().to_rfc3339());
   ```
   ```bash
   2022-03-13T01:59 -> Ok(2022-03-13T01:59:00MST) / 2022-03-13T01:59:00-07:00
   2022-03-13T02:00 -> Err(ParseError(Impossible))
   2022-03-13T02:59 -> Err(ParseError(Impossible))
   2022-03-13T03:00 -> Ok(2022-03-13T03:00:00MDT) / 2022-03-13T03:00:00-06:00
   ```
   ```rust
       let dt = tz.datetime_from_str("2022-11-06T00:59", "%Y-%m-%dT%H:%M");
       println!("2022-11-06T00:59 -> {:?} / {}", dt, dt.unwrap().to_rfc3339());
       let dt = tz.datetime_from_str("2022-11-06T01:00", "%Y-%m-%dT%H:%M");
       println!("2022-11-06T01:00 -> {:?}", dt);
       let dt = tz.datetime_from_str("2022-11-06T01:59", "%Y-%m-%dT%H:%M");
       println!("2022-11-06T01:59 -> {:?}", dt);
       let dt = tz.datetime_from_str("2022-11-06T02:00", "%Y-%m-%dT%H:%M");
       println!("2022-11-06T02:00 -> {:?} / {}", dt, dt.unwrap().to_rfc3339());
   ```
   ```bash
   2022-11-06T00:59 -> Ok(2022-11-06T00:59:00MDT) / 2022-11-06T00:59:00-06:00
   2022-11-06T01:00 -> Err(ParseError(NotEnough))
   2022-11-06T01:59 -> Err(ParseError(NotEnough))
   2022-11-06T02:00 -> Ok(2022-11-06T02:00:00MST) / 2022-11-06T02:00:00-07:00
   ```
   I think `chrono-tz` did a good job here.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1303856668

   @waitingkuo thank you for your clear communication as always! You have the most reviewable PRs and issues I've seen :smile: .
   
   The issue looks good to me, but I would propose tackling in stages:
   1. Allow things like `set time zone to '+08:00';` as described
   2. Allow `set time zone to 'US/Mountain';` or `set time zone to 'US/Denver';` or ``set time zone to 'MT';`
   
   This is due to the extra complication of what to do when someone does a `select TO_TIMESTAMP('2022-11-06T01:30:00');` and the server timezone is 'US/Mountain' and there are two distinct 1:30AMs one in MDT and one in MST. Do we know what postgres does here? The only sensible thing I can think to do is fail the query with an error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1307446761

   @avantgardnerio adding these options looks great as well.
   note that what chronos does is AMBIGUOUS_RAISE and posgresql is AMBIGUOUS_EARLIEST
   but then we need to implement more 😮 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1307018215

   @avantgardnerio 
   TLDR: 
   chrono-tz use 
   
   - timezone data in https://github.com/eggert/tz
   - https://github.com/chronotope/parse-zoneinfo as parser 
   
   
   -------------------------------------------------------------------------------------------
   
   IANA timezone database maintains the offset & leapseconds
   https://www.iana.org/time-zones
   
   this repo has the database and parser (written in c)
   https://github.com/eggert/tz
   here's the sample for North America
   https://github.com/eggert/tz/blob/main/northamerica
   
   chrono-tz use
   https://github.com/eggert/tz as the timezone database
   and https://github.com/chronotope/parse-zoneinfo as parser code
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1305852365

   > never live in the area that has the timezone switch
   
   @waitingkuo I think this is good advice :) Hopefully the US ends this nonsense next year.
   
   I find it troubling that postgres happily parses ambiguous timestamps and chooses a default arbitrarily, but it seems like the emerging trend for DataFusion is "just do what postgres does", so perhaps we just copy their flawed implementation.
   
   What's even more troubling than it defaulting an ambiguous timestamp is that it happily parses a non-existent `2:30AM` in March!
   
   I guess the only real choice we have is:
   
   1. do we follow postgres' arguably flawed behavior?
   2. or do we throw errors for ambiguous/invalid timestamps?
   
   Finally, I even wonder how chrono deals with all this? DST didn't always start or end in March/November, so a database is required to keep track of when various legal jurisdictions implement/modify/retract DST. In unix-based OSes I think this is a tzinfo file, but I know of no equivalent on Windows.
   
   If we do embrace `US/MT`, I think we should account for the [historical shifts](https://en.wikipedia.org/wiki/Daylight_saving_time_in_the_United_States#:~:text=By%20the%20Energy%20Policy%20Act,the%20first%20Sunday%20of%20November.) i.e. we need tests for April < 1986 rather than March.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #4106: Support `SET` timezone to non-UTC time zone

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #4106:
URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1309040786

   > or do we throw errors for ambiguous/invalid timestamps?
   
   This would be my preference I think -- no one likes how postgres handles things.
   
   What I would really prefer is that DataFusion always deals with UTC internally (like all internal math, expressions like now(), etc always use a UTC timestamp). I think this is what happens now
   
   If the input data doesn't specify a timezone, we should treat it like UTC (I know that is not what the arrow spec seems to say)
   
   If the user wants to see times / dates in their current locale's timezone, that can be handled by the end client (e.g. datafusion-cli) rather than forcing the entire processing chain to handle arbitrary timezones just for output display.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org