You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "2010YOUY01 (via GitHub)" <gi...@apache.org> on 2023/04/26 21:35:12 UTC

[GitHub] [arrow-datafusion] 2010YOUY01 opened a new issue, #6133: Bug in round() math function

2010YOUY01 opened a new issue, #6133:
URL: https://github.com/apache/arrow-datafusion/issues/6133

   ### Describe the bug
   
   I noticed this bug when reviewing panics for this issue https://github.com/apache/arrow-datafusion/issues/3316
   https://github.com/apache/arrow-datafusion/blob/a38480951f40abce7ee2d5919251a1d1607f1dee/datafusion/physical-expr/src/math_expressions.rs#L215
   For `round(num, decimal_place)` function, when `decimal_place` overflow Int32 range, it will panic.
   
   ### To Reproduce
   
   ```sql
   -- max Int32 is 2147483647
   select round(3.14, 2147483648);
   ```
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   For a quick fix, the code can return an Execution Error instead of panicking. 
   
   However, it seems better to return a Planning Error during the Parsed Tree -> Logical Plan (aka binding) stage: (1) the `round(num, decimal_place)` implementation relies on the standard library, which only accepts `i32` for `decimal_place`. As a result, the function signature for decimal_place should be `int32`. (2) be more compatible with Postgres, it will return planning error for such query and the function signature for `decimal_place` is 32 bit int.
   
   Currently, all integers are parsed as i64 during planning. Implementing the suggested change might require more modifications. I would like to know if such changes are desired.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6133: panic in round() math function

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6133:
URL: https://github.com/apache/arrow-datafusion/issues/6133#issuecomment-1529701257

   I think making this query error rather than panic would be a great firs state
   
   I don't fully understand the benefit of the proposed fix in `Additional context` section -- @2010YOUY01  what would be better about your proposed solution than just erroring at runtime?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] 2010YOUY01 commented on issue #6133: panic in round() math function

Posted by "2010YOUY01 (via GitHub)" <gi...@apache.org>.
2010YOUY01 commented on issue #6133:
URL: https://github.com/apache/arrow-datafusion/issues/6133#issuecomment-1558478558

   > > Potential solution:
   > 
   > I agree this sounds reasonable to me. Thanks for writing it down @2010YOUY01
   
   Cool! I will work on that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] 2010YOUY01 commented on issue #6133: panic in round() math function

Posted by "2010YOUY01 (via GitHub)" <gi...@apache.org>.
2010YOUY01 commented on issue #6133:
URL: https://github.com/apache/arrow-datafusion/issues/6133#issuecomment-1555164645

   Here is a more detailed description of the alternative fix:
   Currently, 2-args round function has a signature `round(num: Float, decimal_places: Int64) -> Float`, the underlying std library implementation of `round()` math functions can only handle 32-bit decimal places. I think changing the `decimal_places` type to `Int32` in the function signature can better solve this panic issue for these reasons:
   1. Better error handling:
   Suppose there is a batch of data for Int64 decimal places, for the error-in-execution approach, one bad value can kill the whole query. The alternative approach can make sure there won't be any execution error, potential overflow will be caught during binding.
   ```sql
   CREATE TABLE round_data (c1 BIGINT);
   INSERT INTO round_data (c1) VALUES (1), (2), (2147483648);
   SELECT round(3.1415, c1) from round_data;
   ```
   2. There are 3 other unimplemented Postgres math functions that have arguments with `Int32` type (for example 2-args `trunc()`), they don't have to worry about the same overflow issue when implementing.
   
   Potential solution:
   1. Change function argument to `Int32` in function definition and implementation 
   2. Since all scalar integer literals(like this 2147483648 in `select round(3.14, 2147483648);` ) are by default bind to `Int64` type inside datafusion, we can cast them when binding functions [here](https://github.com/apache/arrow-datafusion/blob/dd5e1dbbfd20539b40ae65acb8883f7e392cba92/datafusion/sql/src/expr/function.rs#L53), and throw planning error if overflows.
   (If by default parse integer literal into `Int32`, there will be hundreds of test fails to fix, not so doable. Then I think handling this edge case during planning is better than during execution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6133: panic in round() math function

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6133:
URL: https://github.com/apache/arrow-datafusion/issues/6133#issuecomment-1557970462

   > Potential solution:
   
   I agree this sounds reasonable to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org