You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/03/16 20:11:58 UTC

[GitHub] [arrow] jorisvandenbossche commented on pull request #34570: GH-34568: [C++][Python] Expose Run-End Encoded arrays in Python Arrow

jorisvandenbossche commented on PR #34570:
URL: https://github.com/apache/arrow/pull/34570#issuecomment-1472673013

   Something else: you can create an invalid REE array with non-increasing run ends:
   
   ```
   In [2]: a= pa.RunEndEncodedArray.from_arrays(5, [2, 4, 2, 5], [1, 2, 3, 4])
   
   In [3]: a
   Out[3]: 
   <pyarrow.lib.RunEndEncodedArray object at 0x7f8138ffa920>
   
   -- run_ends:
     [
       2,
       4,
       2,
       5
     ]
   -- values:
     [
       1,
       2,
       3,
       4
     ]
   
   In [4]: pc.run_end_decode(a)
   Segmentation fault (core dumped)
   ```
   
   And as you can see, decoding it then segfaults. 
   The full validation actually catches this:
   
   ```
   In [4]: a.validate(full=True)
   ...
   ArrowInvalid: Every run end must be strictly greater than the previous run end, but run_ends[2] is 2 and run_ends[1] is 4
   ```
   
   But the constructor only does the cheap validation (without `full=True`). But I suppose it is always a bit the question and trade-off what is considered a necessary / cheap check and what is only part of the full validation (the same is true for offsets in variable size list/binary)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org