You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/04/13 08:02:59 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #35088: [Python] pyarrow.compute.subtract_checked overflowing for some duration arrays constructed from numpy

jorisvandenbossche commented on issue #35088:
URL: https://github.com/apache/arrow/issues/35088#issuecomment-1506526303

   @lukemanley thanks for the report. This is an interesting bug .. The difference between both arrays that appear to be the same, is that the actual data buffer is different, because of being created differently (but the data is being masked because they are null, and so the actual value "behind" that null shouldn't matter in theory). 
   "Viewing" the data buffer as an int64 array to see the values:
   
   ```
   In [20]: pa.Array.from_buffers(pa.int64(), 1, [None, arr2.buffers()[1]])
   Out[20]: 
   <pyarrow.lib.Int64Array object at 0x7f4c1af64820>
   [
     0
   ]
   
   In [21]: pa.Array.from_buffers(pa.int64(), 1, [None, arr3.buffers()[1]])
   Out[21]: 
   <pyarrow.lib.Int64Array object at 0x7f4bf5998dc0>
   [
     -9223372036854775808
   ]
   ```
   
   And so my assumption is that the overflow comes from actually subtracting the values in the second case (`86400000000 - (-9223372036854775808)` would indeed overflow. 
   
   However, the way that the "substract_checked" is implemented, _should_ normally only do the actual substraction for data values that are not being masked as null, exactly to avoid situations like the above. But it seems there is a bug in this mechanism to skip values behind nulls.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org