You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "kyle-mccarthy (via GitHub)" <gi...@apache.org> on 2023/04/28 17:25:57 UTC

[GitHub] [arrow-datafusion] kyle-mccarthy opened a new issue, #6155: arithmetic operations on timestamps fail when `now()` is used as the LHS argument

kyle-mccarthy opened a new issue, #6155:
URL: https://github.com/apache/arrow-datafusion/issues/6155

   ### Describe the bug
   
   When performing an arithmetic operation on timestamps, `now()` causes an error when it is used as the LHS argument. It works as expected when it is used at the RHS argument.
   
   ```
   Internal("If RHS of the operation is an array, then LHS also must be")
   ```
   
   ### To Reproduce
   
   The following will return and error and then panic on unwrap.
   
   ```rust
   let ctx = SessionContext::new();
   
   let df = ctx
       .sql(
           r#"
   WITH cte AS (
   SELECT now() AS dt 
   )
   SELECT now() - cte.dt FROM cte
   "#,
       )
       .await
       .unwrap();
   ```
   
   ### Expected behavior
   
   The operation should be successfully evaluated.
   
   ### Additional context
   
   The following will successfully run. Note that `now()` is the RHS arg in this sample.
   
   ```rust
   let ctx = SessionContext::new();
   
   let df = ctx
       .sql(
           r#"
   WITH cte AS (
   SELECT now() AS dt 
   )
   SELECT cte.dt - now() FROM cte
   "#,
       )
       .await
       .unwrap();
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6155: arithmetic operations on timestamps fail when `now()` is used as the LHS argument

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6155:
URL: https://github.com/apache/arrow-datafusion/issues/6155#issuecomment-1529708694

   This query generates an error for me using datafusion-cli (using the latest master branch):
   
   ```
   alamb@MacBook-Pro-8:~$ datafusion-cli
   DataFusion CLI v23.0.0
   ❯ WITH cte AS (
       SELECT now() AS dt
   )
   SELECT now() - cte.dt FROM cte
   ;
   Internal error: If RHS of the operation is an array, then LHS also must be. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] kyle-mccarthy commented on issue #6155: arithmetic operations on timestamps fail when `now()` is used as the LHS argument

Posted by "kyle-mccarthy (via GitHub)" <gi...@apache.org>.
kyle-mccarthy commented on issue #6155:
URL: https://github.com/apache/arrow-datafusion/issues/6155#issuecomment-1530734850

   @Weijun-H can you confirm that you ran the this, rather than the working one? I am using the latest release so it should reproduce.
   
   ```rust
   let ctx = SessionContext::new();
   
   let df = ctx
       .sql(
           r#"
   WITH cte AS (
       SELECT now() AS dt 
   )
   SELECT now() - cte.dt FROM cte
   "#,
       )
       .await
       .unwrap();
   ```
   
   Also @alamb I think that you are right. I did take a look at solving this and hoped that it would be as simple as providing an [inverse for this match arm](https://github.com/apache/arrow-datafusion/blob/6e8f91b41a6ec6a2680357b95b2489d87af33571/datafusion/physical-expr/src/expressions/datetime.rs#L120-L122), but I think some amount of refactoring may be required. 
   
   Currently, all the temporal compute related functions ([^1] [^2] [^3] [^4]) in the physical-expr crate expect LHS to be an array and the RHS to be a scalar. AFAIK, this makes sense for every operation expect for when both the LHS and RHS data types are timestamps (since it doesn't make sense to try and subtract timestamps from intervals).
   
   My initial inclination is to create an enum (TemporalValue) that has variants for the ArrayRef and ScalarValue and then update all the logic to accept the TemporalValue enum. Using an enum instead would require a decent amount of changes to the binary.rs and kernal_arrows.rs files in the expressions module.
   
   Is there a better way to do this? It looks like most compute related thing expect that the LHS is the array and that the RHS is the scalar. I figure that this is because most math operations where order matter can be rewritten?
   
   [^1]: https://github.com/apache/arrow-datafusion/blob/6e8f91b41a6ec6a2680357b95b2489d87af33571/datafusion/physical-expr/src/expressions/binary.rs#L1331-L1335
   [^2]: https://github.com/apache/arrow-datafusion/blob/6e8f91b41a6ec6a2680357b95b2489d87af33571/datafusion/physical-expr/src/expressions/binary/kernels_arrow.rs#L316-L319
   [^3]: https://github.com/apache/arrow-datafusion/blob/6e8f91b41a6ec6a2680357b95b2489d87af33571/datafusion/physical-expr/src/expressions/binary/kernels_arrow.rs#L381-L384
   [^4]: https://github.com/apache/arrow-datafusion/blob/6e8f91b41a6ec6a2680357b95b2489d87af33571/datafusion/physical-expr/src/expressions/binary.rs#L1348


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Weijun-H commented on issue #6155: arithmetic operations on timestamps fail when `now()` is used as the LHS argument

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on issue #6155:
URL: https://github.com/apache/arrow-datafusion/issues/6155#issuecomment-1529146500

   Hi @kyle-mccarthy, I'm unable to reproduce the error you mentioned on my machine. Could you please provide me with more information about the error so that I can better understand and address the issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6155: arithmetic operations on timestamps fail when `now()` is used as the LHS argument

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6155:
URL: https://github.com/apache/arrow-datafusion/issues/6155#issuecomment-1529709306

   I think this just means the arithmetic kernels for timestamps need to handle different combinations of ColumnarValue::Scalar and ColumnarValue::Array


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6155: arithmetic operations on timestamps fail when `now()` is used as the LHS argument

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6155:
URL: https://github.com/apache/arrow-datafusion/issues/6155#issuecomment-1532094893

   > My initial inclination is to create an enum (TemporalValue) that has variants for the ArrayRef and ScalarValue and then update all the logic to accept the TemporalValue enum. Using an enum instead would require a decent amount of changes to the binary.rs and kernal_arrows.rs files in the expressions module.
   
   I wonder if we can use the existing `ColumnarValue` rather than a new enum? https://docs.rs/datafusion/latest/datafusion/physical_plan/enum.ColumnarValue.html ?
   
   
   > Is there a better way to do this? It looks like most compute related thing expect that the LHS is the array and that the RHS is the scalar. I figure that this is because most math operations where order matter can be rewritten?
   
   One way might be to use the https://docs.rs/datafusion/latest/datafusion/physical_plan/enum.ColumnarValue.html#method.into_array function to make the columnar values into arrays (even though this is not super efficient to turn a single value into an array).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #6155: arithmetic operations on timestamps fail when `now()` is used as the LHS argument

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #6155: arithmetic operations on timestamps fail when `now()` is used as the LHS argument
URL: https://github.com/apache/arrow-datafusion/issues/6155


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org