You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "lhoestq (via GitHub)" <gi...@apache.org> on 2023/02/02 12:14:51 UTC

[GitHub] [arrow] lhoestq commented on a diff in pull request #33925: GH-33923: [Docs] Tensor canonical extension type specification

lhoestq commented on code in PR #33925:
URL: https://github.com/apache/arrow/pull/33925#discussion_r1094440818


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -72,4 +72,30 @@ same rules as laid out above, and provide backwards compatibility guarantees.
 Official List
 =============
 
-No canonical extension types have been standardized yet.
+Fixed shape tensor
+==================
+
+* Extension name: `arrow.fixed_shape_tensor`.
+
+* The storage type of the extension: ``FixedSizeList`` where:

Review Comment:
   Just throwing ideas here - please ignore if it doesn't make sense. I'm not familiar enough with the constraints that you have for canonical extension types.
   
   What if in addition to the extension type with `shape` and `value_type`, there is an extension array which stores the fixedsizelist and `strides` (the more general way to interpret a numpy array storage if I understand correctly) - as well as an offset/length in case the array is sliced (because it may not be trivial to slice the storage). Dimension names could be optional in the extension type for computer vision folks.
   
   That means that we could zero-copy read the arrow array as a numpy array (and I guess into pytorch as well). Making an arrow array from a tensor with a storage that doesn't fit a fixedsizelist would require rewriting the storage though.
   
   This should also allow to concatenate tensors with same shape but different memory formats. But on the other hand I'm not sure if it's possible to get a numpy array from tensors with mixed memory formats with zero copy.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org