You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/07 18:05:09 UTC

[GitHub] [arrow-datafusion] adamhooper commented on issue #686: Specific timezone support for `to_timetamp*()`

adamhooper commented on issue #686:
URL: https://github.com/apache/arrow-datafusion/issues/686#issuecomment-875815801


   > > The best solution I can think of would be for to_timestamp(...) to support a second, optional argument where the timezone can be specified.
   > 
   > I think a second optional argument for `to_timestamp` is a fine solution that would be backwards compatible and also allow for improvements going forward
   
   I'm not even a user (yet); I see two big great reasons not to add an argument:
   
   *1. Ambiguous calling pattern*. When converting string to Arrow timestamp, there are two timezones to consider: the input-string timezone and the output-timestamp timezone. Which one would this argument mean?
   
   If I see the call, `TO_TIMESTAMP('2021-01-01T00:00:00', 'America/Montreal')`, then only result I can imagine as "useful" is `2021-01-01T05:00Z` -- that is, "the input is from my clock." (@westonpace, I think we're aligned here.)
   
   Someone else might expect `2020-12-31T19:00-0500`, saying, "I have UTC input and I want Montreal wall-clock time." A third person might expect `2021-01-01Z` with timezone-metadata `America/Montreal`.
   
   If three smart people reading the same line of code would interpret it three different ways, that's a design problem.
   
   *2. PostgreSQL compatibility*. Postgres has two timestamp types; Arrow timestamp+tz-metadata columns are neither.
   
   In Postgres, `TO_TIMESTAMP()` creates a `TIMESTAMP WITH TIME ZONE` -- that is, a moment on our shared, global timeline, stored as 64-bit integer since the UNIX epoch. It has no timezone: each moment happens once everywhere in the world, regardless of timezone.
   
   Postgres also has a `TIMESTAMP WITHOUT TIME ZONE` type -- that is, a calendar-date-plus-wall-clock type ... but Postgres' own `TO_TIMESTAMP()` doesn't output it. That's on purpose. I think most people using `TIMESTAMP WITHOUT TIME ZONE` are using it in error, misled by the name. It represents the very opposite of what it sounds like. In Postgres, it's handy for calendaring apps and not much else. (There are other uses in Pandas/R.)
   
   Postgres has gone 30 years without Arrow's timestamp+tz-metadata types.
   
   In the words of Tom Lane, ["if we simply took away to_timestamp(), most users would be better off"](https://www.postgresql.org/message-id/1264125.1623208850%40sss.pgh.pa.us).
   
   Be careful about adding a feature: removing one might help more users accomplish more tasks, more smoothly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org