You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "felipecrv (via GitHub)" <gi...@apache.org> on 2023/07/25 02:48:30 UTC

[GitHub] [arrow] felipecrv commented on pull request #36800: GH-36789: [C++] Support divide(duration, duration)

felipecrv commented on PR #36800:
URL: https://github.com/apache/arrow/pull/36800#issuecomment-1648934117

   This implementation is very elegant and simple, but it's giving up some accuracy we can reclaim with a bit more careful handling of the units.
   
   If I understand correctly each DURATION operand is being converted to a FLOAT64 before the division, so `x NANO / y MILLI` will first divide `x` by `1e9`, then `y` by `1e3`, and then perform another division with those two results.
   
   There is a considerable accuracy loss for low-valued `x` [1]. An alternative way to do it is to first build a fraction from the units. In this `NANO/MILLI` example that would be `1e-9/1e-3 = 1e-6`.
   
   So the actual computation would be `(x * 1e-6) / y`. This is also more efficient: one `*` and one `/` instead of 3 `/`.
   
   My concern can probably be ignored if numpy/pandas does the naive scaling on both operands and the divide again.
   
   @pitrou what do you think.
   
   ![image](https://github.com/apache/arrow/assets/207795/aac4f042-4e43-4b0c-bd45-b072ed80d6b6)
   
   [1] https://herbie.uwplse.org/demo/8908317b92cc5eb8646c2968ab4956e4e96c9cea.2.0/graph.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org