You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/03/06 17:32:27 UTC

[GitHub] [arrow-rs] tustvold commented on pull request #3690: Allow precision loss on multiplying decimal arrays

tustvold commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1456593148

   I have to confess to not really being sure how to move forward with this, as I have a number of concerns:
   
   * The performance will be catastrophic, it is formatting to strings, and performing multiple memory allocations per operation
   * The data type of the output depends on the input data, something that most query engines, including DataFusion, aren't setup to handle
   * The precision loss bleeds across rows, and therefore is deterministic on the batching, which typically isn't guaranteed in systems like DataFusion
   
   My understanding is this is trying to emulate the behaviour of Spark decimals, which store precision and scale per field. However, I'm not sure this PR actually achieves this, in the event of overflow this PR will truncate the precision of all values in the array, potentially truncating values that wouldn't have been truncated by Spark. I'm therefore not really sure what this PR gains us, it trades a behaviour change that results in an error, to a behaviour change that results in silent truncation? Am I missing something here?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org