You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wjones127 (via GitHub)" <gi...@apache.org> on 2023/09/09 19:28:11 UTC

[GitHub] [arrow-rs] wjones127 opened a new issue, #4805: pyarrow module can't roundtrip tensor arrays

wjones127 opened a new issue, #4805:
URL: https://github.com/apache/arrow-rs/issues/4805

   **Describe the bug**
   
   When exporting a tensor array (a kind of extension array) as a record batch, PyArrow segfaults. This does not happen if the batch is exported as a stream.
   
   **To Reproduce**
   
   The following test will fail in `arrow-pyarrow-integration-testing/tests/test_sql.py`:
   
   ```python
   def test_tensor_array():
       tensor_type = pa.fixed_shape_tensor(pa.float32(), [2, 3])
       inner = pa.array([float(x) for x in range(1, 7)] + [None] * 12, pa.float32())
       storage = pa.FixedSizeListArray.from_arrays(inner, 6)
       f32_array = pa.ExtensionArray.from_storage(tensor_type, storage)
   
       # Round-tripping as an array gives back storage type, because arrow-rs has
       # no notion of extension types.
       b = rust.round_trip_array(f32_array)
       assert b == f32_array.storage
   
       batch = pa.record_batch([f32_array], ["tensor"])
       b = rust.round_trip_record_batch(batch)
       assert b == batch
   
       del b
   ```
   
   **Expected behavior**
   
   We should round trip the array type successfully.
   
   **Additional context**
   
   The record batch exporting is done by exporting each individual array, but this separates the extension arrays from their metadata. I suspect PyArrow segfaults because it is receiving a plain array and then later told it is an extension in the final schema. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #4805: pyarrow module can't roundtrip tensor arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #4805: pyarrow module can't roundtrip tensor arrays
URL: https://github.com/apache/arrow-rs/issues/4805


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org