You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mhilton (via GitHub)" <gi...@apache.org> on 2023/09/29 07:57:10 UTC

[GitHub] [arrow-datafusion] mhilton opened a new issue, #7697: date_bin cannot use a string origin parameter when the expression has a time zone.

mhilton opened a new issue, #7697:
URL: https://github.com/apache/arrow-datafusion/issues/7697

   ### Describe the bug
   
   The optional third parameter to `date_bin`, which is the `origin` time, can normally be specified with a time constant in a string. If the input `expression` has a type that includes a time zone then datafusion returns the following error:
   
   ```
   Error during planning: No function matches the given name and argument types 'date_bin(Utf8, Timestamp(Nanosecond, Some("+00:00")), Utf8)'. You might need to add explicit type casts.
   	Candidate functions:
   	date_bin(Interval(MonthDayNano), Timestamp(Nanosecond, None), Timestamp(Nanosecond, None))
   	date_bin(Interval(MonthDayNano), Timestamp(Nanosecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
   	date_bin(Interval(DayTime), Timestamp(Nanosecond, None), Timestamp(Nanosecond, None))
   	date_bin(Interval(DayTime), Timestamp(Nanosecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
   	date_bin(Interval(MonthDayNano), Timestamp(Nanosecond, None))
   	date_bin(Interval(MonthDayNano), Timestamp(Nanosecond, Some("+TZ")))
   	date_bin(Interval(DayTime), Timestamp(Nanosecond, None))
   	date_bin(Interval(DayTime), Timestamp(Nanosecond, Some("+TZ")))
   	date_bin(Interval(MonthDayNano), Timestamp(Microsecond, None), Timestamp(Nanosecond, None))
   	date_bin(Interval(MonthDayNano), Timestamp(Microsecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
   	date_bin(Interval(DayTime), Timestamp(Microsecond, None), Timestamp(Nanosecond, None))
   	date_bin(Interval(DayTime), Timestamp(Microsecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
   	date_bin(Interval(MonthDayNano), Timestamp(Microsecond, None))
   	date_bin(Interval(MonthDayNano), Timestamp(Microsecond, Some("+TZ")))
   	date_bin(Interval(DayTime), Timestamp(Microsecond, None))
   	date_bin(Interval(DayTime), Timestamp(Microsecond, Some("+TZ")))
   	date_bin(Interval(MonthDayNano), Timestamp(Millisecond, None), Timestamp(Nanosecond, None))
   	date_bin(Interval(MonthDayNano), Timestamp(Millisecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
   	date_bin(Interval(DayTime), Timestamp(Millisecond, None), Timestamp(Nanosecond, None))
   	date_bin(Interval(DayTime), Timestamp(Millisecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
   	date_bin(Interval(MonthDayNano), Timestamp(Millisecond, None))
   	date_bin(Interval(MonthDayNano), Timestamp(Millisecond, Some("+TZ")))
   	date_bin(Interval(DayTime), Timestamp(Millisecond, None))
   	date_bin(Interval(DayTime), Timestamp(Millisecond, Some("+TZ")))
   	date_bin(Interval(MonthDayNano), Timestamp(Second, None), Timestamp(Nanosecond, None))
   	date_bin(Interval(MonthDayNano), Timestamp(Second, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
   	date_bin(Interval(DayTime), Timestamp(Second, None), Timestamp(Nanosecond, None))
   	date_bin(Interval(DayTime), Timestamp(Second, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
   	date_bin(Interval(MonthDayNano), Timestamp(Second, None))
   	date_bin(Interval(MonthDayNano), Timestamp(Second, Some("+TZ")))
   	date_bin(Interval(DayTime), Timestamp(Second, None))
   	date_bin(Interval(DayTime), Timestamp(Second, Some("+TZ")))
   ```
   
   ### To Reproduce
   
   To reproduce run the following script in `datafusion-cli`:
   
   ```
   -- Create a table with some timestamps in it.
   CREATE TABLE test (
     time TIMESTAMP WITH TIME ZONE
   ) AS VALUES
     ('2000-01-01T00:00:00Z'),
     ('2000-01-01T00:00:01Z');
   
   -- Check that the column preserved the time zone.
   SELECT arrow_typeof(time) FROM test;
   
   -- Check that date_bin runs without an origin.
   SELECT date_bin('1 minute', time) FROM test;
   
   -- Check that date_bin runs with an origin.
   SELECT date_bin('1 minute', time, '1970-01-01T00:00:00Z') FROM test;
   ```
   
   
   ### Expected behavior
   
   The `SELECT date_bin('1 minute', time) FROM test;` and `SELECT date_bin('1 minute', time, '1970-01-01T00:00:00Z') FROM test;` queries should produce identical results.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] date_bin cannot use a string origin parameter when the expression has a time zone. [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #7697: date_bin cannot use a string origin parameter when the expression has a time zone.
URL: https://github.com/apache/arrow-datafusion/issues/7697


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #7697: date_bin cannot use a string origin parameter when the expression has a time zone.

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #7697:
URL: https://github.com/apache/arrow-datafusion/issues/7697#issuecomment-1741131533

   @mhilton  and I just had chat and here are some notes:
   
   The way the code currently works is that `date_bin` lists several signatures it can handle, such as
   
   https://github.com/apache/arrow-datafusion/blob/d19e9d684bbe1fd820674d48a96795bfbea9db7d/datafusion/expr/src/built_in_function.rs#L1041-L1046
   
   If the user specifies arguments to that function as, eg, a String that doesn't match the required types, the datafusion coercion logic kicks in and adds casts on the specific arguments to make them conform to one of the available signatures. 
   
   The current convention is that `Timestamp(Second, Some("+TZ")))` effectively means "any `Timestamp` type with a timezone, and then the implementation of the function (in this case `date_bin`) must handle any possible timezone that comes in. 
   
   The problem with the current convention is that the coercion rules for `Timestamp(Second, Some("+TZ")))` https://github.com/apache/arrow-datafusion/blob/d19e9d684bbe1fd820674d48a96795bfbea9db7d/datafusion/expr/src/type_coercion/functions.rs#L222-L226
   
   Only Support casting from another timestamp, not from Strings, the way `Timestamp(Second, None))` does: 
   https://github.com/apache/arrow-datafusion/blob/d19e9d684bbe1fd820674d48a96795bfbea9db7d/datafusion/expr/src/type_coercion/functions.rs#L209-L216
   
   
   Thus I think the solution @mhilton  and I brainstormed is to change the signature to `Timestamp(Second, Some("+00:00)))` (aka only accept UTC timestamps) and teach the coercion logic to cast all argument to that time.  Then the implementation of `date_bin` only needs to handle one timezone (UTC)
   
   cc @wiedld 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org