You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/22 17:33:10 UTC

[GitHub] [arrow] rabernat commented on issue #4802: pyarrow / pandas support for tensors (multi-dimensional arrays)

rabernat commented on issue #4802:
URL: https://github.com/apache/arrow/issues/4802#issuecomment-1106721345

   I found this issue while investigating how we might round-trip Xarray datasets (which contain many ndarray) through Arrow.
   
   My impression from the recent comments it that Arrow is really not meant for this. However, given the strong growth of Arrow, many people are nevertheless trying to put ndarrays into it. Huggingface datasets, for example, [uses an ExtensionArray](https://github.com/huggingface/datasets/blob/9f2ff14673cac1f1ad56d80221a793f5938b68c7/src/datasets/features/features.py#L585-L641) to store ndarrays in arrow. Would you say that is aligned with best practices?
   
   @NightMachinary and others may want to consider using [Zarr](https://zarr.readthedocs.io/en/stable/) as an efficient storage format for nd-arrays.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org