You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/04/03 13:16:40 UTC

[GitHub] [arrow-datafusion] tustvold commented on pull request #5811: Scalar arithmetic should return error when overflows.

tustvold commented on PR #5811:
URL: https://github.com/apache/arrow-datafusion/pull/5811#issuecomment-1494309259

   > mostly for consistency with the semantics of arrow-rs
   
   So this is not consistent with the arrow kernels that DataFusion makes use of, in particular, DataFusion is currently using the unchecked arithmetic kernels, which do not return an error on overflow. So this PR will introduce inconsistency between ScalarValue arithmetic, and any arithmetic involving arrays.
   
   The major reason I'm a little apprehensive about changing this is that the checked kernels are at least an order of magnitude slower in the presence of nulls, as LLVM can't vectorise them correctly
   
   ```
   add(0.1)                time:   [7.4481 µs 7.4539 µs 7.4608 µs]
   Found 6 outliers among 100 measurements (6.00%)
     2 (2.00%) high mild
     4 (4.00%) high severe
   
   add_checked(0.1)        time:   [101.62 µs 106.31 µs 111.40 µs]
   ```
   
   I definitely think whatever semantics we settle on should be consistent and well documented, but I don't have a strong opinion what it should be. However, I do feel that changing the current semantics warrants some communication due to the major performance regression it would entail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org