You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/04/07 08:44:23 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #34944: [Python] PyArrow pa.array fails intermittently with custom iterable object

jorisvandenbossche commented on issue #34944:
URL: https://github.com/apache/arrow/issues/34944#issuecomment-1500079228

   @bdice thanks for the report! I can reproduce this (I actually seem to get the crash consistently)
   
   It might also be an upstream Python issue? Because if I replace `pa.array(A())` with `list(A())`, this hangs / blows up memory. 
   You call it an "iterable" object, but since it doesn't have a length, it will iterate indefinitly?
   
   In the C++ code handling generic input, we first try to convert any python object to an actual sequence:
   
   https://github.com/apache/arrow/blob/7526df9ad97219cc44f9b460887405f2b0e86fd4/python/pyarrow/src/arrow/python/python_to_arrow.cc#L1098-L1101
   
   It seems that this object is passing the `if (PySequence_Check(obj))` check, which I find a bit surprising (in pure python, it's not considered a Sequence when using collections.abc)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org