You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "felipecrv (via GitHub)" <gi...@apache.org> on 2023/05/30 22:11:44 UTC

[GitHub] [arrow] felipecrv opened a new issue, #35830: [C++] Handle only relevant slices of child arrays when hashing scalars from ListArrays

felipecrv opened a new issue, #35830:
URL: https://github.com/apache/arrow/issues/35830

   ### Describe the enhancement requested
   
   Issue is explained in the Python code below:
   
   ```python
   import pyarrow as pa
   
   a = pa.array([
       [{'a': 5}, {'a': 6}],
       [{'a': 7}, None]
   ])
   b = pa.array([
       [{'a': 7}, None]
   ])
   
   # a[1] and b[0] are represented as 2-element slices of a child array containing struct values
   # they start on different offsets, but obviously compare as equal
   assert a[1] == b[0]
   
   # logically equal values should hash to the same value, so when hashing the hashing
   # of the child array should start at the offset and not from 0 as it's done by default.
   hash1 = hash(a[1])
   hash2 = hash(b[0])
   assert hash1 == hash2
   ```
   
   #35814 fixes the bug for lists of structs, but the same bug might exist for other nested types:
   
   - [x] struct
   - [ ] sparse union
   - [ ] dense union
   - [ ] run-end encoded
   - [ ] more?
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org