You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/24 09:48:45 UTC

[GitHub] [arrow-rs] tustvold opened a new issue, #1936: Cast Kernel Ignores Timezone

tustvold opened a new issue, #1936:
URL: https://github.com/apache/arrow-rs/issues/1936

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   The beginnings of timezone support were added in #824, however, this is currently ignored by the cast kernel
   
   **Describe the solution you'd like**
   
   Timezones should be correctly handled by the cast kernel
   
   **Describe alternatives you've considered**
   
   We could not support timezones
   
   **Additional context**
   
   Noticed whilst investigating #1932 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] doki23 commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
doki23 commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1253131853

   We consider tz only if from_type and to_type both needs it. For example, ignoring tz is ok when we cast ts to i64, because i64 array doesn't care about timezone.
   
   So, there're 2 situations:
   
   1. ts is from_type, and the to_type needs tz.
   2. ts is to_type, and the from_type has tz.
   
   - when ts is from type
   I noticed that timestamp array is always treat as `PrimitiveArray`. We can't get tz from `ArrowPrimitiveType::DATA_TYPE`, because there're only utc ts definitions like:
   ```rust
   make_type!(
       TimestampSecondType,
       i64,
       DataType::Timestamp(TimeUnit::Second, None)
   );
   ```
   So, `ArrayData::data_type` makes sense. 
   
   - ts is to type
   We may need a `TimestampBuilder::from` in place of `TimestampSecondArray::from_xxx` which can realize the timezone.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1309690020

   Yeah let's do the "hard" operation in the cast kernel, and if people don't like it, they can perform a metadata-only cast using`with_timezone_opt` :+1:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1309670226

   > what's expected behavior for casting timestamp with timezone to timestamp without time zone?
   
   The specification states
   
   > /// However, if a Timestamp column has no timezone value, changing it to a
   > /// non-empty value requires to think about the desired semantics.
   > /// One possibility is to assume that the original timestamp values are
   > /// relative to the epoch of the timezone being set; timestamp values should
   > /// then adjusted to the Unix epoch (for example, changing the timezone from
   > /// empty to "Europe/Paris" would require converting the timestamp values
   > /// from "Europe/Paris" to "UTC", which seems counter-intuitive but is
   > /// nevertheless correct).
   
   As stated above, given this is the only possibility enumerated I think we should follow this. The inverse operation, i.e. removing a timezone, I would therefore expect to do the reverse i.e. `1970-01-01 01:00:01 +01:00` would become `1970-01-01 01:00:01`. This is consistent with both postgres and chrono.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1936: Cast Kernel Ignores Timezone

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1543876228

   Reopening as there are still issues around casting between timestamps with timezones #4201


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1936: Cast Kernel Ignores Timezone

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1452380894

   https://github.com/apache/arrow-rs/issues/3794 is related to this, and implements the necessary machinery for timezone aware parsing of strings


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] avantgardnerio commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1253816815

   > what's expected behavior for casting timestamp with timezone to timestamp without time zone?
   
   The intuitive answer would be to convert it to UTC. I think postgres effectively casts it to server (local) time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] waitingkuo commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1252765678

   @doki23 are you planning to draft the proposal?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] doki23 commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
doki23 commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1254488901

   > what's expected behavior for casting timestamp with timezone to timestamp without time zone?
   Tz has no affect to the value, it just used for display.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] doki23 commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
doki23 commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1250190014

   I'd like to work on this. Could you assign it to me? @tustvold 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] waitingkuo commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1250859927

   @tustvold thank you for pinging me, i'm working on these things now as well. 
   
   @doki23 it would be great if you could help ❤️ 
   You could find some other related issues here apache/arrow-datafusion#3148
   
   some hints that might help
   
   1
   https://github.com/apache/arrow-rs/blob/3bf6eb98ceb3962e1d9419da6dc93e609f7893e6/arrow/src/compute/kernels/cast.rs#L1284
   to make casting function consider timezone, we have to fix the second `_` identifier and check whether it's `None` or `Some`
   
   2
   https://github.com/apache/arrow-rs/blob/3bf6eb98ceb3962e1d9419da6dc93e609f7893e6/arrow/src/array/array_primitive.rs#L209
   while using fmt to print, we first convert it to `NaiveDateTime` (from chrono-rs) which contains no timezone info so that you could only see timestamp without timezone
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] waitingkuo commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1309689139

   thank you @tustvold , didn't aware this spec before
   This looks good to me
   
   btw, i think `with_timezone_opt` already covered another possibility - keep the underline i64 value and change the metadata


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1250214349

   I would recommend writing up the expected behaviour first, as timezone handling is notoriously messy, and once we have consensus we can move forward with implementing that.
   
   FYI @avantgardnerio @waitingkuo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] doki23 commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
doki23 commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1253055929

   > @doki23 are you planning to draft the proposal?
   
   sure, plz wait me hours


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #1936: Cast Kernel Ignores Timezone

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #1936: Cast Kernel Ignores Timezone
URL: https://github.com/apache/arrow-rs/issues/1936


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1253830499

   I believe this section of the arrow schema definition is relevant - https://github.com/apache/arrow-rs/blob/master/format/Schema.fbs#L280
   
   In particular
   
   > One possibility is to assume that the original timestamp values are relative to the epoch of the timezone being set; timestamp values should then adjusted to the Unix epoch
   
   Given this is the only possibility enumerated in the schema, I feel this is probably the one we should follow unless people feel strongly otherwise. My 2 cents is that anything relying on the local timezone of the system is best avoided if at all possible, it just feels fragile and error-prone.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] avantgardnerio commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1253834426

   > local timezone of the system
   
   Yes - these RecordBatches could be part of Flights, yes? In which case the whole point is to send them around to different computers that may be in different timezones, so it kind of forces our hand here.
   
   And if we are doing it this way in arrow where we don't have the luxury of following postgres, maybe this is also where we break postgres compatibility in DataFusion. Just because postgres did it wrong doesn't mean we should follow...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] waitingkuo commented on issue #1936: Cast Kernel Ignores Timezone

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #1936:
URL: https://github.com/apache/arrow-rs/issues/1936#issuecomment-1253276790

   what's expected behavior for casting timestamp with timezone to timestamp without time zone?
   
   e.g. if the timestamp with timezone is `1970-01-01T08:00:00+08:00` (note that the timestamp underline is 0), what's the casted result? `1970-01-01T08:00:00` (underline timestamp as 28800000000000) or `1970-01-01T00:00:00`  (underline timestamp as 0)?
   
   i recommend that listing these ambiguous cases and your proposed behavior so we could discuss
   
   btw i tested it on pyarrow, it simply changes the datatype but not change the underline timestamp (`1970-01-01T08:00:00+08:00` becomes `1970-01-01T00:00:00`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #1936: Cast Kernel Ignores Timezone

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #1936: Cast Kernel Ignores Timezone
URL: https://github.com/apache/arrow-rs/issues/1936


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org