You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "clarkzinzow (via GitHub)" <gi...@apache.org> on 2023/05/15 19:13:22 UTC

[GitHub] [arrow] clarkzinzow opened a new issue, #35599: [Python] Canonical fixed-shape tensor extension array/type is not picklable.

clarkzinzow opened a new issue, #35599:
URL: https://github.com/apache/arrow/issues/35599

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   The [fixed-shape tensor extension type](https://arrow.apache.org/docs/python/extending_types.html#fixed-size-tensor) does not appear to be picklable. Given that pickling Arrow data is supported in general and is used in Python-centric systems such as Ray, supporting pickling for canonical extension types/arrays seems reasonable.
   
   ## Reproduction
   
   ```python
   pickle.loads(pickle.dumps(pa.fixed_shape_tensor(pa.int64(), (2, 2))))
   ```
   raises the error:
   ```
   KeyError                                  Traceback (most recent call last)
   File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4798, in pyarrow.lib.type_for_alias()
   
   KeyError: 'extension<arrow.fixed_shape_tensor>'
   ```
   
   ```python
   tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
   arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
   storage = pa.array(arr, pa.list_(pa.int32(), 4))
   tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
   pickle.loads(pickle.dumps(tensor_array))
   ```
   raises the ~same error:
   ```
   KeyError                                  Traceback (most recent call last)
   File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4798, in pyarrow.lib.type_for_alias()
   
   KeyError: 'extension<arrow.fixed_shape_tensor>'
   
   During handling of the above exception, another exception occurred:
   
   ValueError                                Traceback (most recent call last)
   Cell In[13], line 1
   ----> 1 pickle.loads(pickle.dumps(tensor_array))
   
   File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4800, in pyarrow.lib.type_for_alias()
   
   ValueError: No type alias for extension<arrow.fixed_shape_tensor>
   ```
   
   ## Environment
   
   - pyarrow 12.0.0
   - Python 3.9
   - MacOS
   
   ## Possible Solution
   
   It seems like we might be able to implement `__reduce__` on [`FixedShapeTensorType`](https://github.com/apache/arrow/blob/2d76d9a526f9827283bb7dfac60715b6ad4aec34/python/pyarrow/types.pxi#L1511C12-L1587) such that it uses the `__arrow_ext_serialize__` serialization protocol? E.g.
   ```python
   def __reduce__(self):
       return FixedShapeTensorType.__arrow_ext_deserialize__, (self.storage, self.__arrow_ext_serialize__())
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche closed issue #35599: [Python] Canonical fixed-shape tensor extension array/type is not picklable.

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche closed issue #35599: [Python] Canonical fixed-shape tensor extension array/type is not picklable.
URL: https://github.com/apache/arrow/issues/35599


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on issue #35599: [Python] Canonical fixed-shape tensor extension array/type is not picklable.

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on issue #35599:
URL: https://github.com/apache/arrow/issues/35599#issuecomment-1549063885

   Thank you for the issue @clarkzinzow!
   You are correct. I think it is reasonable to implement `__reduce__ ` method as you suggested.
   
   Are you interested in making a PR with the proposed solution and a test for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #35599: [Python] Canonical fixed-shape tensor extension array/type is not picklable.

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35599:
URL: https://github.com/apache/arrow/issues/35599#issuecomment-1549248678

   > It seems like we might be able to implement `__reduce__` on [`FixedShapeTensorType`](https://github.com/apache/arrow/blob/2d76d9a526f9827283bb7dfac60715b6ad4aec34/python/pyarrow/types.pxi#L1511C12-L1587) such that it uses the `__arrow_ext_serialize__` serialization protocol?
   
   We might even be able to put this on the base ExtensionType class, so that every extension type implementation automatically has this implemented (since this should be generic).
   
   The reason for the current error message is that it falls back to the base class DataType reducer which essentially pickles the string repr of the type (which is overridden by many type subclasses):
   
   https://github.com/apache/arrow/blob/f6e447944f2a2ab108d5971daf351b7443bc96fb/python/pyarrow/types.pxi#L225-L226
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org