You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "paleolimbot (via GitHub)" <gi...@apache.org> on 2023/09/06 13:17:03 UTC

[GitHub] [arrow] paleolimbot commented on a diff in pull request #37166: GH-24868: [C++] Add a Tensor logical value type with varying dimensions, implemented using ExtensionType

paleolimbot commented on code in PR #37166:
URL: https://github.com/apache/arrow/pull/37166#discussion_r1317269300


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -148,6 +148,76 @@ Fixed shape tensor
   by this specification. Instead, this extension type lets one use fixed shape tensors
   as elements in a field of a RecordBatch or a Table.
 
+.. _variable_shape_tensor_extension:
+
+Variable shape tensor
+=====================
+
+* Extension name: `arrow.variable_shape_tensor`.
+
+* The storage type of the extension is: ``StructArray`` where struct
+  is composed of **data** and **shape** fields describing a single
+  tensor per row:
+
+  * **data** is a ``List`` holding tensor elements of a single tensor.
+    Data type of the list elements is uniform across the entire column
+    and also provided in metadata.
+  * **shape** is a ``FixedSizeList<uint32>[ndim]`` of the tensor shape where
+    the size of the list ``ndim`` is equal to the number of dimensions of the
+    tensor.

Review Comment:
   Yes, there is potentially a different `ndim` for each item in the array. I imagine that in practice this does not frequently occur but at the time we resolve the Arrow output type we don't have any actual data to inspect to guess. Opening that can of worms would be hard but we might have to do it for other reasons, too (e.g., guessing decimal output precision/bitwidth since that can very by row as well in Postgres).
   
   I don't want the discussion to get *too* hung up on this point if it makes life more difficult. If I had to choose between allowing `ndim` to vary among items in the array and consistency with the `fixed_shape_tensor()`, I would pick consistency with the fixed shape tensor! There are other considerations for returning arrays from Postgres to Arrow (for example, if `ndim` is 1, a more intuitive output type would be a plain `List`); my initial comment of "we won't be able to use this" is probably not true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org