You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "AlenkaF (via GitHub)" <gi...@apache.org> on 2023/04/04 12:56:37 UTC

[GitHub] [arrow] AlenkaF opened a new pull request, #34883: GH-34882: [Python] Binding for FixedShapeTensorType

AlenkaF opened a new pull request, #34883:
URL: https://github.com/apache/arrow/pull/34883

   ### Rationale for this change
   In the C++ the fixed shape tensor canonical extension type is implementated https://github.com/apache/arrow/pull/8510 so we can add bindings to the extension type in Python.
   
   ### What changes are included in this PR?
   Binding for fixed shape tensor canonical extension type.
   
   ### Are these changes tested?
   Yes.
   
   ### Are there any user-facing changes?
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1159603352


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   ```suggestion
           numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape))
   ```
   
   (order="C" is the default so not needed to specify, and reading the docs it's also not specifying the output order, but only about how to index into the input array. And since our input is 1D, the order is not relevant)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1503207748

   I suggest we then first use the 3rd option: "Return always the actual C-contiguous array and raise error if permutation is non-trivial" and as Joris mentioned, finetune it in a follow-up PR (after the work on `to/from_tensor` in C++ is finished).
   
   The check for `to_numpy_ndarray` was added with the last commit: https://github.com/apache/arrow/pull/34883/commits/1ebb8297773494813f5515215cb869eb0ea959f6
   
   @rok do you agree with the current state of this PR? If yes we could merge it today and get it into 12.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1160513413


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   > No, the `self.type.permutation` is only that of the individual tensor elements (given that you have a FixedShapeTensorArray, by definition the first dimension of the n+1 dim ndarray is always the length of the array, and can't be permuted)
   
   Yes, thank you for confirming the thought I was currently struggling with!
   Then I do not see the need for raising an error (as it would be raised for every permutation that is not `None`) but maybe make it explicit in the docstrings that the "0" dimension is by default not permutable and is fixed (see the examples and description I added in the docs PR for this binding: https://github.com/apache/arrow/pull/34957/files)
   
   > Yes, and as mentioned before, I think we should certainly add examples how the user can do this (if we decide to not automatically do it in `to_numpy_ndarray`)
   
   Done: https://github.com/apache/arrow/pull/34957/files
   
   > I wouldn't raise a warning for that, since it's a fact of life that numpy arrays don't have dimension names, so that's just a consequence of calling this method. We can mention that in the docstring, though, to be explicit.
   
   Agree, will make it explicit in the docstrings.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1162142218


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   > 2. `to_numpy_ndarray` is reshaping 1-D array so there is no need to check the layout style as 1-D array is always both C-contiguous and F-contiguous.
   
   While that's true for the physical layout it's not necessarily true for the logical layout. If permutation is non-trivial tenor will not be laid out in memory in C-contiguous way. I propose (https://github.com/apache/arrow/pull/34883#pullrequestreview-1378208716) we block converting tensors with non-trivial permutations and add correct logic to handle those later. (Another option is to go via `array.ToTensor().to_numpy()` once [`FromTensor/ToTensor`](https://github.com/apache/arrow/pull/34797) is merged.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche merged pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche merged PR #34883:
URL: https://github.com/apache/arrow/pull/34883


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1159613216


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   What happens in cases where permutation doesn't give C style layout?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1159620598


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   A permutation would happen after the reshape, I think? (so the initial reshape is always C contiguous) And that is left to the user for now in this method (which we should document)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1159630434


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')
+
+        return numpy_tensor
+
+    @staticmethod
+    def from_numpy_ndarray(obj):
+        """
+        Convert numpy tensors (ndarrays) to a fixed shape tensor extension array.
+        The first dimension of ndarray will become the length of the fixed
+        shape tensor array.
+
+        Numpy array needs to be C-contiguous in memory
+        (``obj.flags["C_CONTIGUOUS"]==True``).
+
+        Parameters
+        ----------
+        obj : ndarray
+
+        Examples
+        --------
+        >>> import pyarrow as pa
+        >>> import numpy as np
+        >>> arr = np.array(
+        ...         [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]],
+        ...         dtype=np.float32)
+        >>> pa.FixedShapeTensorArray.from_numpy_ndarray(arr)
+        <pyarrow.lib.FixedShapeTensorArray object at ...>
+        [
+          [
+            1,
+            2,
+            3,
+            4,
+            5,
+            6
+          ],
+          [
+            1,
+            2,
+            3,
+            4,
+            5,
+            6
+          ]
+        ]
+        """
+        if obj.flags["F_CONTIGUOUS"]:
+            raise ValueError('The data in the numpy array need to be in a single, '
+                             'C-style contiguous segment.')

Review Comment:
   Probably best to have this discussion in #34797 indeed. As above I'd check if permutation is trivial and throw otherwise to ensure correctness and merge this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1503756289

   > What do you mean exactly with "pass the permutation/strides information"? (I suppose you mean returning the permuted/logical array by default?)
   
   I think so yes :D.
   
   > Users could indeed do this conversion to an nD array manually (the `to_numpy_ndarray` implementation isn't that long), but to me that feels a bit the point of providing this method ;) (it could be a keyword argument to the function, to indicate if you want the permuted logical array or not)
   
   I suppose having this as an opt-in behavior (with the keyword) would work yeah. Just want to make sure this is not on a default path as that would be surprising.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1502853636

   Thank you for summing it up Joris!
   
   My vote is for option 2 ("Provide option to indicate if you want the permuted result or not") and I also do not think option 1 or 3 are good ways to go - for same reasons already mentioned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1503146111

   You don't necessarily "loose" that information, you still have it available in the extension type. It won't be embedded in the numpy array, but neither do the dimension names (of course, I know this is a bit different, as names can never be embedded in a numpy array). So in general you might still need to process the extension type attributes to ensure you correctly handle the resulting numpy array. 
   
   For example, assume you have channels-last array (logically NCHW, physically NHWC). This information could also be stored as `"dim_names": ["H", "W", "C"]`, or only as permutation ([2, 0, 1]), or both. So if you convert such a array to numpy, you still need to check for those aspects as well, and also need to take into account that if you already got a permutated array by default from `to_numpy_ndarray`, the dim_names were not permuted.  
   (this is different compared to a Tensor, where the permuted dim names can be attached to the Tensor object)
   
   Just to mention that there are various related aspects that can make this more complex. And so I still think it is a _valid_ option to say for `to_numpy_ndarray` that it will just simply ignore all optional metadata (and thus always return the physical array), and it is up to the user to decide what to do with the optional metadata (permutation and/or dim_names).
   
   Now, as mentioned above, I don't have a that strong opinion about it, so I am also fine if in the end `to_numpy_ndarray` permutes by default (in that case I think I would prefer adding an keyword to turn it off, since it's not that straightforward to convert the permuted array to its physical form). 
   
   On the short term, erroring for those ambiguous cases is also fine, that leaves all options open for finetuning this in a follow-up PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1502986824

   > One question is: are there good use cases for not wanting the automatic permutation? Because if so, always returning the permuted array (option 1) is also annoying because then it's less trivial to go back to the plain C array, instead of permuting the C array?
   
   I think we would always want automatic permutation. The data being passed will always be passed zero-copy so users could always choose to interpret it with different strides/permutation, but the expectation is they would not want to do that in 99% of cases. Loosing strides/permutation information is a sharp edge imo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1159601100


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   It doesn't guarantee to be zero-scopy in all cases, but in this simple case of reshaping a 1D array to an nD array, this is always zero copy AFAIK



##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   It doesn't guarantee to be zero-copy in all cases, but in this simple case of reshaping a 1D array to an nD array, this is always zero copy AFAIK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1159627221


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   Right, but permutation would not roundtrip to numpy object and back. I'd prefer to throw NotImplemented if permutation is not trivial (C).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158186356


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1130,13 +1131,32 @@ def test_cpp_extension_in_python(tmpdir):
 
 
 def test_tensor_type():
-    tensor_type = pa.FixedShapeTensorType(pa.int8(), (2, 3))
+    tensor_type = pa.fixed_shape_tensor(pa.int8(), [2, 3])
     assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
     assert tensor_type.storage_type == pa.list_(pa.int8(), 6)
+    assert tensor_type.shape == [2, 3]
+    assert not tensor_type.dim_names
+    assert not tensor_type.permutation

Review Comment:
   What is the exact return value? (I assume not False, but None or an empty list?) 
   If it's an empty list, we should maybe check in the bindings for the attribute whether the underlying C++ vector is empty, and in that case return None instead? For a python API, that feels nicer to indicate "not defined", and also ensures that _if_ it returns a list, it is guaranteed to return a list of the same length as `shape`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1161913669


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   After couple of days to think it over, I would agree with all the comments from Joris at the beginning of the thread.
   
   1. The use of permutation to reorder data should (for now) be left to the user to decide if they want to do that or not. I have added a note in the docstrings: https://github.com/apache/arrow/pull/34883/commits/dd8fd31f4b15ca2e0a39673ab2a338b991cd2032
   2. `to_numpy_ndarray` is reshaping 1-D array so there is no need to check the layout style as 1-D array is always both C-contiguous and F-contiguous.
   3. We cannot create a F-contiguous tensor in PyArrow and there is a check for row layout in `from_numpy_ndarray` so there is no option, currently, to have a F-contiguous tensors in the tensor array anyways.
   
   With this I propose to leave the code as is and maybe open an issue later to discuss changes/features further.
   
   I have also added a PR for the documentation of the `fixed_shape_tensor` binding here: https://github.com/apache/arrow/pull/34957. After the binding PR is merged I will run the code again, build the docs locally to check the html and then mark the PR as ready for review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1503176409

   Just to be clear - `pyarrow->numpy` conversion should (imo) use permutation to create ndarray with correct strides while using the same physical layout (zero-copy) of the data buffer. I don't see any good reason to not do this in pyarrow but I might be missing something?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1503028476

   An example of `pyarrow -> numpy` conversion that loses permutaion/strides:
   ```python
   import pyarrow as pa
   from numpy.testing import assert_array_equal
   
   tensor_type1 = pa.fixed_shape_tensor(pa.int8(), [2, 2, 3], permutation=[0, 1, 2])
   tensor_type2 = pa.fixed_shape_tensor(pa.int8(), [2, 2, 3], permutation=[2, 1, 0])
   storage = pa.array([[1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]], pa.list_(pa.int8(), 12))
   arr1 = pa.ExtensionArray.from_storage(tensor_type1, storage)
   arr2 = pa.ExtensionArray.from_storage(tensor_type1, storage)
   narr1 = arr1.to_numpy_ndarray()
   narr2 = arr2.to_numpy_ndarray()
   
   assert_array_equal(narr1, narr2)
   assert arr1.equals(arr2)
   ```
   
   If `arr2.to_numpy_ndarray()` would throw an error saying this currently (we'd want to add it at some point) can't handle permuted conversion I think we'd be ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1502846716

   On the topic of taking the permutation into account or not, I personally don't have a strong opinion what to do by default, but to list the different options:
   
   1. Return always the permuted array (so non C-contiguous is permutation is non-trivial)
   2. Provide option to indicate if you want the permuted result or not (e.g. `permuted=True/False`)
     a) with returning permuted array as default
     b) with returning the actual C-contiguous array by default 
   3. Return always the actual C-contiguous array and raise error if permutation is non-trivial
   4. Return always the actual C-contiguous array and just ignore permutation info (and clearly document this, leave it to the user to permute if they want)
   
   One question is: are there good use cases for _not_ wanting the automatic permutation? Because if so, always returning the permuted array (option 1) is also annoying because then it's less trivial to go back to the plain C array, instead of permuting the C array? 
   Personally I don't like option 3, because that restricts the user to actually get the ndarray, even if they are fine with ignoring any permutation info.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158516896


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1127,3 +1127,89 @@ def test_cpp_extension_in_python(tmpdir):
     reconstructed_array = batch.column(0)
     assert reconstructed_array.type == uuid_type
     assert reconstructed_array == array
+
+
+def test_tensor_type():
+    tensor_type = pa.fixed_shape_tensor(pa.int8(), [2, 3])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.int8(), 6)
+    assert tensor_type.shape == [2, 3]
+    assert tensor_type.dim_names is None
+    assert tensor_type.permutation is None
+
+    tensor_type = pa.fixed_shape_tensor(pa.float64(), [2, 2, 3],
+                                        permutation=[0, 2, 1])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.float64(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert tensor_type.dim_names is None
+    assert tensor_type.permutation == [0, 2, 1]
+
+    tensor_type = pa.fixed_shape_tensor(pa.bool_(), [2, 2, 3],
+                                        dim_names=['C', 'H', 'W'])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.bool_(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert tensor_type.dim_names == ['C', 'H', 'W']
+    assert tensor_type.permutation is None
+
+
+@pytest.mark.parametrize("numpy_order", ('C', 'F'))
+def test_tensor_class_methods(numpy_order):
+    tensor_type = pa.fixed_shape_tensor(pa.float32(), [2, 3])
+    storage = pa.array([[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]],
+                       pa.list_(pa.float32(), 6))
+    arr = pa.ExtensionArray.from_storage(tensor_type, storage)
+    expected = np.array(
+        [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]], dtype=np.float32)
+
+    result = arr.to_numpy_ndarray()
+    np.testing.assert_array_equal(result, expected)
+
+    arr = np.array(
+        [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]],
+        dtype=np.float32, order=numpy_order)
+    tensor_array_from_numpy = pa.FixedShapeTensorArray.from_numpy_ndarray(arr)
+    assert isinstance(tensor_array_from_numpy.type, pa.FixedShapeTensorType)
+    assert tensor_array_from_numpy.type.value_type == pa.float32()
+    assert tensor_array_from_numpy.type.shape == [2, 3]

Review Comment:
   Currently it handles C- or F-contiguous numpy arrays the same (meaning the result will be the same). Do you think we should give an error if non-C-contiguous array is passed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1159603634


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')
+
+        return numpy_tensor
+
+    @staticmethod
+    def from_numpy_ndarray(obj):
+        """
+        Convert numpy tensors (ndarrays) to a fixed shape tensor extension array.
+        The first dimension of ndarray will become the length of the fixed
+        shape tensor array.
+
+        Numpy array needs to be C-contiguous in memory
+        (``obj.flags["C_CONTIGUOUS"]==True``).
+
+        Parameters
+        ----------
+        obj : ndarray

Review Comment:
   ```suggestion
           obj : numpy.ndarray
   ```



##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')
+
+        return numpy_tensor
+
+    @staticmethod
+    def from_numpy_ndarray(obj):
+        """
+        Convert numpy tensors (ndarrays) to a fixed shape tensor extension array.
+        The first dimension of ndarray will become the length of the fixed
+        shape tensor array.
+
+        Numpy array needs to be C-contiguous in memory
+        (``obj.flags["C_CONTIGUOUS"]==True``).
+
+        Parameters
+        ----------
+        obj : ndarray
+
+        Examples
+        --------
+        >>> import pyarrow as pa
+        >>> import numpy as np
+        >>> arr = np.array(
+        ...         [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]],
+        ...         dtype=np.float32)
+        >>> pa.FixedShapeTensorArray.from_numpy_ndarray(arr)
+        <pyarrow.lib.FixedShapeTensorArray object at ...>
+        [
+          [
+            1,
+            2,
+            3,
+            4,
+            5,
+            6
+          ],
+          [
+            1,
+            2,
+            3,
+            4,
+            5,
+            6
+          ]
+        ]
+        """
+        if obj.flags["F_CONTIGUOUS"]:

Review Comment:
   ```suggestion
           if not obj.flags["C_CONTIGUOUS"]:
   ```
   
   An array can be neither F nor C contiguous, so checking that has to be C is better here



##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')
+
+        return numpy_tensor
+
+    @staticmethod
+    def from_numpy_ndarray(obj):
+        """
+        Convert numpy tensors (ndarrays) to a fixed shape tensor extension array.
+        The first dimension of ndarray will become the length of the fixed
+        shape tensor array.
+
+        Numpy array needs to be C-contiguous in memory
+        (``obj.flags["C_CONTIGUOUS"]==True``).
+
+        Parameters
+        ----------
+        obj : ndarray
+
+        Examples
+        --------
+        >>> import pyarrow as pa
+        >>> import numpy as np
+        >>> arr = np.array(
+        ...         [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]],
+        ...         dtype=np.float32)
+        >>> pa.FixedShapeTensorArray.from_numpy_ndarray(arr)
+        <pyarrow.lib.FixedShapeTensorArray object at ...>
+        [
+          [
+            1,
+            2,
+            3,
+            4,
+            5,
+            6
+          ],
+          [
+            1,
+            2,
+            3,
+            4,
+            5,
+            6
+          ]
+        ]
+        """
+        if obj.flags["F_CONTIGUOUS"]:
+            raise ValueError('The data in the numpy array need to be in a single, '
+                             'C-style contiguous segment.')

Review Comment:
   We could maybe first resolve the discussion in #34797 before trying to add it here? (it's not strictly needed to already make this useful)
   
   There are also two possible behaviours: make a copy to ensure C contiguous preserving the shape, or reshape to get a C contiguous view (but with different shape). 
   Leaving it as is for now leaves this choice to the user to do this explicitly (we could add examples for both). Or could add a keyword to control how this is done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158548398


##########
python/pyarrow/public-api.pxi:
##########
@@ -118,6 +119,8 @@ cdef api object pyarrow_wrap_data_type(
         cpy_ext_type = dynamic_cast[_CPyExtensionTypePtr](ext_type)
         if cpy_ext_type != nullptr:
             return cpy_ext_type.GetInstance()
+        elif ext_type.extension_name() == tensor_name:

Review Comment:
   Works, and it is so much nicer!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1159690915


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   Raising an error for now sounds fine to me 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1159572171


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   Does reshape incur a copy? Perhaps we should throw if `permutation[0] != 0` (as that would mean array can't be chunked)?
   Also shouldn't we take permutation into account when reshaping?



##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')
+
+        return numpy_tensor
+
+    @staticmethod
+    def from_numpy_ndarray(obj):
+        """
+        Convert numpy tensors (ndarrays) to a fixed shape tensor extension array.
+        The first dimension of ndarray will become the length of the fixed
+        shape tensor array.
+
+        Numpy array needs to be C-contiguous in memory
+        (``obj.flags["C_CONTIGUOUS"]==True``).
+
+        Parameters
+        ----------
+        obj : ndarray
+
+        Examples
+        --------
+        >>> import pyarrow as pa
+        >>> import numpy as np
+        >>> arr = np.array(
+        ...         [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]],
+        ...         dtype=np.float32)
+        >>> pa.FixedShapeTensorArray.from_numpy_ndarray(arr)
+        <pyarrow.lib.FixedShapeTensorArray object at ...>
+        [
+          [
+            1,
+            2,
+            3,
+            4,
+            5,
+            6
+          ],
+          [
+            1,
+            2,
+            3,
+            4,
+            5,
+            6
+          ]
+        ]
+        """
+        if obj.flags["F_CONTIGUOUS"]:
+            raise ValueError('The data in the numpy array need to be in a single, '
+                             'C-style contiguous segment.')

Review Comment:
   I don't remember of the top of my head how strides work in numpy, but could we extract permutation at this point and use it instead of forcing C-style? (see proposal for FromTensor for what I mean: https://github.com/apache/arrow/pull/34797/files#diff-47327ebe34666ff7683a620f0210f7d1a1388121ba787cc5ea7cc113f4f66dadR154-R165)



##########
python/pyarrow/types.pxi:
##########
@@ -4543,6 +4623,105 @@ def run_end_encoded(run_end_type, value_type):
     return pyarrow_wrap_data_type(ree_type)
 
 
+def fixed_shape_tensor(DataType value_type, shape, dim_names=None, permutation=None):
+    """
+    Create instance of fixed shape tensor extension type with shape and optional
+    names of tensor dimensions and indices of the desired ordering.

Review Comment:
   ```suggestion
       names of tensor dimensions and indices of the desired logical
       ordering of dimensions.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1503499447

   > `pyarrow->numpy` conversion should (imo) use permutation data to create ndarray with correct strides while using the same physical layout (zero-copy) of the data buffer. I don't see any good reason to not do this in pyarrow but I might be missing something?
   
   @rok see my comments above? 
   _Potential_ reasons that someone might not want to have the permuted ndarray:
   
   - For a specific use case, someone might not care about the permutation and just wants the physical C-contiguous array
   - Because you want to do the handling of permutation / dim_names yourself, and therefore first want to get the numpy array as the physical array (see https://github.com/apache/arrow/pull/34883#issuecomment-1503146111 above for a possible reason for this, eg if you need to handle the potential combination of both permutation and/or dim_names, this might be easier to just all do yourself)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1503728907

   > > For a specific use case, someone might not care about the permutation and just wants the physical C-contiguous array
   > 
   > Wouldn't that not be possible with `tensor_ext_arr.storage.to_numpy()`? Or would that not be zero-copy?
   
   `to_numpy()` gives a 1D object-dtype array of arrays, so that's not convenient to work with if you want the nD array (it can actually be "zero-copy", but you can't get the zero-copy nD array from it)
   
   > How about we have pyarrow pass the permutation/strides information by default and have users use lower level API for manual handling of permutation / dim_names?
   
   What do you mean exactly with "pass the permutation/strides information"? (I suppose you mean returning the permuted/logical array by default?) 
   Users could indeed do this conversion to an nD array manually (the `to_numpy_ndarray` implementation isn't that long), but to me that feels a bit the point of providing this method ;) (it could be a keyword argument to the function, to indicate if you want the permuted logical array or not)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1162369729


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,113 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+
+        Note: The method doesn't take into account ``permutation`` and ``dim_names``
+        parameter therefore this information will be lost.
+        """

Review Comment:
   Trivial permutation should be equal to None (no permutation), I think? Then we could just check for non None values?
   
   Also, I feel this is to strict as the user might want to use the information about the permutation themselves and permute the ndarray afterwords like I have suggested in the docs example: https://github.com/apache/arrow/blame/902aa069002e8a470a5f19d5e0137a600e85b66b/docs/source/python/extending_types.rst#L466-L538 (which they can not do if we throw an error).
   
   You mention cases where information about the ordering could be lost and the user would have no way of correcting it - I will try to find a specific case that would support this so I am sure we are on the same page and we could have a better idea about what would be best.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1162137978


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,113 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+
+        Note: The method doesn't take into account ``permutation`` and ``dim_names``
+        parameter therefore this information will be lost.
+        """

Review Comment:
   Let's block non-trival conversions until we have the logic to do it correctly.
   ```suggestion
           """
           if not all(x < y for x, y in zip(self.type.permutation, self.type.permutation[1:]):
             raise ValueError('Only non-permuted tensors can be converted to numpy tensors.')
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1505223644

   Benchmark runs are scheduled for baseline = e488942cd552ac36a46d40477c1b0326a626ed98 and contender = d21c1c79c0d6377d035bd90da5d5d09e04e49079. d21c1c79c0d6377d035bd90da5d5d09e04e49079 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/6c4a12f957c2470ba7b886c87113f3dd...80d882993b574bcf898055913a169a6e/)
   [Failed] [test-mac-arm](https://conbench.ursa.dev/compare/runs/ed325b9e3f4f425a9aad14b16b6658cb...de71aa35c7e640bfab4f9519c86df0c0/)
   [Finished :arrow_down:0.26% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8684967004a14b21bbf8668612b9dd2b...dd74534ccd614c0eaebfe87bc8232d87/)
   [Finished :arrow_down:0.09% :arrow_up:0.06%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/5dfca74eb17c439ca92ba53034978d16...3de9ac2e2cf942d99b407bf55a412691/)
   Buildkite builds:
   [Finished] [`d21c1c79` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2677)
   [Failed] [`d21c1c79` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2710)
   [Finished] [`d21c1c79` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2676)
   [Finished] [`d21c1c79` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2703)
   [Finished] [`e488942c` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2676)
   [Failed] [`e488942c` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2709)
   [Finished] [`e488942c` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2675)
   [Finished] [`e488942c` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2702)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158473676


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1127,3 +1127,89 @@ def test_cpp_extension_in_python(tmpdir):
     reconstructed_array = batch.column(0)
     assert reconstructed_array.type == uuid_type
     assert reconstructed_array == array
+
+
+def test_tensor_type():
+    tensor_type = pa.fixed_shape_tensor(pa.int8(), [2, 3])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.int8(), 6)
+    assert tensor_type.shape == [2, 3]
+    assert tensor_type.dim_names is None
+    assert tensor_type.permutation is None
+
+    tensor_type = pa.fixed_shape_tensor(pa.float64(), [2, 2, 3],
+                                        permutation=[0, 2, 1])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.float64(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert tensor_type.dim_names is None
+    assert tensor_type.permutation == [0, 2, 1]
+
+    tensor_type = pa.fixed_shape_tensor(pa.bool_(), [2, 2, 3],
+                                        dim_names=['C', 'H', 'W'])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.bool_(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert tensor_type.dim_names == ['C', 'H', 'W']
+    assert tensor_type.permutation is None
+
+
+@pytest.mark.parametrize("numpy_order", ('C', 'F'))
+def test_tensor_class_methods(numpy_order):
+    tensor_type = pa.fixed_shape_tensor(pa.float32(), [2, 3])
+    storage = pa.array([[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]],
+                       pa.list_(pa.float32(), 6))
+    arr = pa.ExtensionArray.from_storage(tensor_type, storage)
+    expected = np.array(
+        [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]], dtype=np.float32)
+
+    result = arr.to_numpy_ndarray()
+    np.testing.assert_array_equal(result, expected)
+
+    arr = np.array(
+        [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]],
+        dtype=np.float32, order=numpy_order)
+    tensor_array_from_numpy = pa.FixedShapeTensorArray.from_numpy_ndarray(arr)
+    assert isinstance(tensor_array_from_numpy.type, pa.FixedShapeTensorType)
+    assert tensor_array_from_numpy.type.value_type == pa.float32()
+    assert tensor_array_from_numpy.type.shape == [2, 3]

Review Comment:
   Can you also test the roundtrip here? Converting back to the ndarray (to_numpy_ndarray), and comparing with original. 
   
   I am also not sure this is currently correct for non-C-contiguous arrays. In `from_numpy_ndarray`, I think we currently just assume we get a C contiguous (row major) one, since that is what the spec assumes. Passing a non-C-contiguous just as is might give unexpected results.



##########
python/pyarrow/array.pxi:
##########
@@ -3090,6 +3090,69 @@ cdef class ExtensionArray(Array):
         return self.storage.to_numpy(**kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = FixedShapeTensorType(pa.int32(), [2, 2])

Review Comment:
   This needs to be updated now



##########
python/pyarrow/array.pxi:
##########
@@ -3090,6 +3090,69 @@ cdef class ExtensionArray(Array):
         return self.storage.to_numpy(**kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = FixedShapeTensorType(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')
+
+        return numpy_tensor
+
+    def from_numpy_ndarray(obj):
+        """
+        Convert numpy tensors (ndarrays) to a fixed shape tensor extension array.

Review Comment:
   Need to add a Parameters section to make linting happy
   
   I would also add a bit more explanation about that the first dimension becomes the length of the tensor array etc



##########
python/pyarrow/types.pxi:
##########
@@ -4543,6 +4623,100 @@ def run_end_encoded(run_end_type, value_type):
     return pyarrow_wrap_data_type(ree_type)
 
 
+def fixed_shape_tensor(DataType value_type, shape, dim_names=None, permutation=None):
+    """
+    Create instance of fixed shape tensor extension type with shape and optional
+    names of tensor dimensions and indices of the desired ordering.
+
+    Parameters
+    ----------
+    value_type : DataType
+        Data type of individual tensor elements.
+    shape : tuple or list of integers
+        The physical shape of the contained tensors.
+    dim_names : tuple or list of strings, default None
+        Explicit names to tensor dimensions.
+    permutation : tuple or list integers, default None
+        Indices of the desired ordering of the original dimensions.
+
+    Examples
+    --------
+    Create an instance of fixed shape tensor extension type:
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+    >>> tensor_type
+    FixedShapeTensorType(extension<arrow.fixed_shape_tensor>)
+
+    Inspect the data type:
+
+    >>> tensor_type.value_type
+    DataType(int32)
+    >>> tensor_type.shape
+    [2, 2]
+
+    Create a table with fixed shape tensor extension array:
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> tensor = pa.ExtensionArray.from_storage(tensor_type, storage)
+    >>> pa.table([tensor], names=["tensor_array"])
+    pyarrow.Table
+    tensor_array: extension<arrow.fixed_shape_tensor>
+    ----
+    tensor_array: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+
+    Create an instance of fixed shape tensor extension type with names
+    of tensor dimensions:
+
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int8(), (2, 2, 3),
+    ...                                     dim_names=['C', 'H', 'W'])
+    >>> tensor_type.dim_names
+    [b'C', b'H', b'W']

Review Comment:
   ```suggestion
       ['C', 'H', 'W']
   ```



##########
python/pyarrow/includes/libarrow.pxd:
##########
@@ -2619,6 +2619,31 @@ cdef extern from "arrow/extension_type.h" namespace "arrow":
         shared_ptr[CArray] storage()
 
 
+cdef extern from "arrow/extension/fixed_shape_tensor.h" namespace "arrow::extension":
+    cdef cppclass CFixedShapeTensorType \
+            " arrow::extension::FixedShapeTensorType"(CExtensionType):
+
+        @staticmethod
+        CResult[shared_ptr[CDataType]] Make(const shared_ptr[CDataType]& value_type,
+                                            const vector[int64_t]& shape,
+                                            const vector[int64_t]& permutation,
+                                            const vector[c_string]& dim_names)
+
+        CResult[shared_ptr[CDataType]] Deserialize(const shared_ptr[CDataType] storage_type,
+                                                   const c_string& serialized_data) const
+
+        c_string Serialize() const
+
+        const shared_ptr[CDataType] value_type()
+        const vector[int64_t] shape()
+        const vector[int64_t] permutation()
+        const vector[c_string] dim_names()
+
+        CFixedShapeTensorType(shared_ptr[CDataType]& value_type, int32_t& size,
+                              vector[int64_t]& shape, vector[int64_t]& permutation,
+                              vector[c_string]& dim_names)

Review Comment:
   This is not actually used at the moment?



##########
python/pyarrow/types.pxi:
##########
@@ -4543,6 +4623,100 @@ def run_end_encoded(run_end_type, value_type):
     return pyarrow_wrap_data_type(ree_type)
 
 
+def fixed_shape_tensor(DataType value_type, shape, dim_names=None, permutation=None):
+    """
+    Create instance of fixed shape tensor extension type with shape and optional
+    names of tensor dimensions and indices of the desired ordering.
+
+    Parameters
+    ----------
+    value_type : DataType
+        Data type of individual tensor elements.
+    shape : tuple or list of integers
+        The physical shape of the contained tensors.
+    dim_names : tuple or list of strings, default None
+        Explicit names to tensor dimensions.
+    permutation : tuple or list integers, default None
+        Indices of the desired ordering of the original dimensions.

Review Comment:
   I would expand this explanation a bit (I think it's fine to copy part of the explanation of this parameter in the spec)



##########
python/pyarrow/public-api.pxi:
##########
@@ -118,6 +119,8 @@ cdef api object pyarrow_wrap_data_type(
         cpy_ext_type = dynamic_cast[_CPyExtensionTypePtr](ext_type)
         if cpy_ext_type != nullptr:
             return cpy_ext_type.GetInstance()
+        elif ext_type.extension_name() == tensor_name:

Review Comment:
   Just wondering, does it work as well to just do here `.. == b"arrow.fixed_shape_tensor"` (that would avoid having to define the name earlier in the code, which is a bit easier to read)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1162142218


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   > 2. `to_numpy_ndarray` is reshaping 1-D array so there is no need to check the layout style as 1-D array is always both C-contiguous and F-contiguous.
   
   While that's true for the physical layout it's not necessarily true for the logical layout. I propose (https://github.com/apache/arrow/pull/34883#pullrequestreview-1378208716) we block converting tensors with non-trivial permutations and add correct logic to handle those later. (Another option is to go via `array.ToTensor().to_numpy()` once [`FromTensor/ToTensor`](https://github.com/apache/arrow/pull/34797) is merged.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1162420417


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,113 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+
+        Note: The method doesn't take into account ``permutation`` and ``dim_names``
+        parameter therefore this information will be lost.
+        """

Review Comment:
   Ok, will add 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158194346


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1130,13 +1131,32 @@ def test_cpp_extension_in_python(tmpdir):
 
 
 def test_tensor_type():
-    tensor_type = pa.FixedShapeTensorType(pa.int8(), (2, 3))
+    tensor_type = pa.fixed_shape_tensor(pa.int8(), [2, 3])
     assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
     assert tensor_type.storage_type == pa.list_(pa.int8(), 6)
+    assert tensor_type.shape == [2, 3]
+    assert not tensor_type.dim_names
+    assert not tensor_type.permutation

Review Comment:
   I think it returns `None`, will check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158112172


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1127,3 +1127,86 @@ def test_cpp_extension_in_python(tmpdir):
     reconstructed_array = batch.column(0)
     assert reconstructed_array.type == uuid_type
     assert reconstructed_array == array
+
+
+def test_tensor_type():
+    tensor_type = pa.FixedShapeTensorType(pa.int8(), (2, 3))
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.int8(), 6)

Review Comment:
   Can you test all the custom attributes here as well? (shape, permutation, dim_names, value_type)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158536564


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1127,3 +1127,89 @@ def test_cpp_extension_in_python(tmpdir):
     reconstructed_array = batch.column(0)
     assert reconstructed_array.type == uuid_type
     assert reconstructed_array == array
+
+
+def test_tensor_type():
+    tensor_type = pa.fixed_shape_tensor(pa.int8(), [2, 3])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.int8(), 6)
+    assert tensor_type.shape == [2, 3]
+    assert tensor_type.dim_names is None
+    assert tensor_type.permutation is None
+
+    tensor_type = pa.fixed_shape_tensor(pa.float64(), [2, 2, 3],
+                                        permutation=[0, 2, 1])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.float64(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert tensor_type.dim_names is None
+    assert tensor_type.permutation == [0, 2, 1]
+
+    tensor_type = pa.fixed_shape_tensor(pa.bool_(), [2, 2, 3],
+                                        dim_names=['C', 'H', 'W'])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.bool_(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert tensor_type.dim_names == ['C', 'H', 'W']
+    assert tensor_type.permutation is None
+
+
+@pytest.mark.parametrize("numpy_order", ('C', 'F'))
+def test_tensor_class_methods(numpy_order):
+    tensor_type = pa.fixed_shape_tensor(pa.float32(), [2, 3])
+    storage = pa.array([[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]],
+                       pa.list_(pa.float32(), 6))
+    arr = pa.ExtensionArray.from_storage(tensor_type, storage)
+    expected = np.array(
+        [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]], dtype=np.float32)
+
+    result = arr.to_numpy_ndarray()
+    np.testing.assert_array_equal(result, expected)
+
+    arr = np.array(
+        [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]],
+        dtype=np.float32, order=numpy_order)
+    tensor_array_from_numpy = pa.FixedShapeTensorArray.from_numpy_ndarray(arr)
+    assert isinstance(tensor_array_from_numpy.type, pa.FixedShapeTensorType)
+    assert tensor_array_from_numpy.type.value_type == pa.float32()
+    assert tensor_array_from_numpy.type.shape == [2, 3]

Review Comment:
   Hmm, OK, so checking the docstring of `ndarray.flatten()` (which is currently used under the hood), that actually ensures to return C contiguous data, and so will make a copy if needed (if the original array is not C contiguous). So this will indeed work correctly, it will just not be zero-copy in that case (while users might expect this). 
   
   If we keep it, we should at least explicitly document this (that the method is only zero copy if the passed data is row-major). Alternatively, on the short term we would only allow row major data to avoid surprises (and we can always expand it later)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1157448809


##########
python/pyarrow/types.pxi:
##########
@@ -1494,6 +1494,135 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return ExtensionScalar
 
+
+cdef class FixedShapeTensorType(BaseExtensionType):
+    """
+    Concrete class for fixed shape tensor extension type.
+
+    Parameters
+    ----------
+    value_type : DataType
+        Data type of individual tensor elements.
+    shape : tuple
+        The physical shape of the contained tensors.
+    dim_names : tuple
+        Explicit names to tensor dimensions.
+    permutation : tuple
+        Indices of the desired ordering of the original dimensions.
+
+    Examples
+    --------
+    >>> import pyarrow as pa
+
+    Create fixed shape tensor extension type:
+
+    >>> tensor_type = pa.FixedShapeTensorType(pa.int32(), [2, 2])
+    >>> tensor_type
+    FixedShapeTensorType(extension<arrow.fixed_shape_tensor>)
+
+    Inspect the data type:
+
+    >>> tensor_type.value_type
+    DataType(int32)
+    >>> tensor_type.shape
+    [2, 2]
+
+    Create a fixed shape tensor extension type with names of tensor dimensions:
+
+    >>> tensor_type = pa.FixedShapeTensorType(pa.int8(), (2, 2, 3), dim_names=['C', 'H', 'W'])
+    >>> tensor_type.dim_names
+    [b'C', b'H', b'W']
+
+    Create a fixed shape tensor extension type with permutation:
+    >>> tensor_type = pa.FixedShapeTensorType(pa.int8(), (2, 2, 3), permutation=[0, 2, 1])
+    >>> tensor_type.permutation
+    [0, 2, 1]
+    """
+
+    def __init__(self, DataType value_type, shape, dim_names=None, permutation=None):

Review Comment:
   For all other types, we typically disallow calling the `__init__`, but provide a factory function to construct the type (eg `pa.timestamp(..)` instead of `pa.TimestampType(..)`). While that gives some extra API here, we should probably follow that pattern?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158076126


##########
python/pyarrow/array.pxi:
##########
@@ -3090,6 +3090,71 @@ cdef class ExtensionArray(Array):
         return self.storage.to_numpy(**kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = FixedShapeTensorType(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')
+
+        return numpy_tensor
+
+    def from_numpy_ndarray(obj):
+        """
+        Convert numpy tensors (ndarrays) to a fixed shape tensor extension array.
+        """
+        numpy_type = obj.flatten().dtype
+        arrow_type = from_numpy_dtype(numpy_type)

Review Comment:
   ```suggestion
           arrow_type = from_numpy_dtype(obj.dtype)
   ```
   
   You don't need to flatten the array to get the dtype



##########
python/pyarrow/array.pxi:
##########
@@ -3090,6 +3090,71 @@ cdef class ExtensionArray(Array):
         return self.storage.to_numpy(**kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = FixedShapeTensorType(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')
+
+        return numpy_tensor
+
+    def from_numpy_ndarray(obj):
+        """
+        Convert numpy tensors (ndarrays) to a fixed shape tensor extension array.
+        """
+        numpy_type = obj.flatten().dtype
+        arrow_type = from_numpy_dtype(numpy_type)
+        shape = obj.shape[1:]
+        size = obj.size / obj.shape[0]
+
+        return ExtensionArray.from_storage(
+            FixedShapeTensorType(arrow_type, shape),
+            array([t.flatten() for t in obj],
+                  list_(arrow_type, size))

Review Comment:
   Can you try to use `pa.FixedSizeListArray.from_arrays(..)` to do this zero-copy?



##########
python/pyarrow/types.pxi:
##########
@@ -4543,6 +4615,100 @@ def run_end_encoded(run_end_type, value_type):
     return pyarrow_wrap_data_type(ree_type)
 
 
+def fixedshapetensor(DataType value_type, shape, dim_names=None, permutation=None):

Review Comment:
   ```suggestion
   def fixed_shape_tensor(DataType value_type, shape, dim_names=None, permutation=None):
   ```



##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1127,3 +1127,86 @@ def test_cpp_extension_in_python(tmpdir):
     reconstructed_array = batch.column(0)
     assert reconstructed_array.type == uuid_type
     assert reconstructed_array == array
+
+
+def test_tensor_type():
+    tensor_type = pa.FixedShapeTensorType(pa.int8(), (2, 3))
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.int8(), 6)
+
+
+def test_tensor_class_methods():
+    tensor_type = pa.FixedShapeTensorType(pa.float32(), (2, 3))
+    storage = pa.array([[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]],
+                       pa.list_(pa.float32(), 6))
+    arr = pa.ExtensionArray.from_storage(tensor_type, storage)
+    expected = np.array(
+        [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]], dtype=np.float32)
+
+    result = arr.to_numpy_ndarray()
+    np.testing.assert_array_equal(result, expected)
+
+    tensor_array_from_numpy = pa.FixedShapeTensorArray.from_numpy_ndarray(expected)
+    assert isinstance(tensor_array_from_numpy.type, pa.FixedShapeTensorType)
+    assert tensor_array_from_numpy.type.value_type == pa.float32()
+    assert tensor_array_from_numpy.type.shape == [2, 3]

Review Comment:
   Can you add some more tests for from_numpy_ndarray? (for example starting from a numpy array that isn't C contiguous)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158487127


##########
python/pyarrow/array.pxi:
##########
@@ -3090,6 +3090,69 @@ cdef class ExtensionArray(Array):
         return self.storage.to_numpy(**kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = FixedShapeTensorType(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')
+
+        return numpy_tensor
+
+    def from_numpy_ndarray(obj):

Review Comment:
   ```suggestion
   
       @staticmethod
       def from_numpy_ndarray(obj):
   ```
   
   I don't fully understand how the current code actually runs fine (which it does given the tests are passing), but we probably need to make this a staticmethod, since you call this on the class (`pa.FixedShapeTensorArray.from_numpy_ndarray(..)`) and not on an instance (if this was a normal method, the first argument is interpreted as `self`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158182795


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1130,13 +1131,32 @@ def test_cpp_extension_in_python(tmpdir):
 
 
 def test_tensor_type():
-    tensor_type = pa.FixedShapeTensorType(pa.int8(), (2, 3))
+    tensor_type = pa.fixed_shape_tensor(pa.int8(), [2, 3])
     assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
     assert tensor_type.storage_type == pa.list_(pa.int8(), 6)
+    assert tensor_type.shape == [2, 3]
+    assert not tensor_type.dim_names
+    assert not tensor_type.permutation
+
+    tensor_type = pa.fixed_shape_tensor(pa.float64(), [2, 2, 3],
+                                        permutation=[0, 2, 1])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.float64(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert not tensor_type.dim_names
+    assert tensor_type.permutation == [0, 2, 1]
+
+    tensor_type = pa.fixed_shape_tensor(pa.bool_(), [2, 2, 3],
+                                        dim_names=['C', 'H', 'W'])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.bool_(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert tensor_type.dim_names == [tobytes(x) for x in ['C', 'H', 'W']]

Review Comment:
   Instead of converting the expected result to bytes here, I think it would be better to convert it to a list of strings in the actual bindings of the attribute



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158556462


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1127,3 +1127,89 @@ def test_cpp_extension_in_python(tmpdir):
     reconstructed_array = batch.column(0)
     assert reconstructed_array.type == uuid_type
     assert reconstructed_array == array
+
+
+def test_tensor_type():
+    tensor_type = pa.fixed_shape_tensor(pa.int8(), [2, 3])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.int8(), 6)
+    assert tensor_type.shape == [2, 3]
+    assert tensor_type.dim_names is None
+    assert tensor_type.permutation is None
+
+    tensor_type = pa.fixed_shape_tensor(pa.float64(), [2, 2, 3],
+                                        permutation=[0, 2, 1])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.float64(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert tensor_type.dim_names is None
+    assert tensor_type.permutation == [0, 2, 1]
+
+    tensor_type = pa.fixed_shape_tensor(pa.bool_(), [2, 2, 3],
+                                        dim_names=['C', 'H', 'W'])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.bool_(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert tensor_type.dim_names == ['C', 'H', 'W']
+    assert tensor_type.permutation is None
+
+
+@pytest.mark.parametrize("numpy_order", ('C', 'F'))
+def test_tensor_class_methods(numpy_order):
+    tensor_type = pa.fixed_shape_tensor(pa.float32(), [2, 3])
+    storage = pa.array([[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]],
+                       pa.list_(pa.float32(), 6))
+    arr = pa.ExtensionArray.from_storage(tensor_type, storage)
+    expected = np.array(
+        [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]], dtype=np.float32)
+
+    result = arr.to_numpy_ndarray()
+    np.testing.assert_array_equal(result, expected)
+
+    arr = np.array(
+        [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]],
+        dtype=np.float32, order=numpy_order)
+    tensor_array_from_numpy = pa.FixedShapeTensorArray.from_numpy_ndarray(arr)
+    assert isinstance(tensor_array_from_numpy.type, pa.FixedShapeTensorType)
+    assert tensor_array_from_numpy.type.value_type == pa.float32()
+    assert tensor_array_from_numpy.type.shape == [2, 3]

Review Comment:
   Great, thank you for investigating! Will use `ravel("C")` and raise an error if data wasn't C-contiguous in memory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1159620598


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   A permutation would happen after the reshape, I think? (so the initial reshape is always C contiguous) And that is left to the user for now in this method (which we should document, or add)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1160420150


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   Great! I think raising an error (or a warning?) sounds good to me too 👍 
   Will add it in `to_numpy_ndarray` for cases when `permutation` is not trivial (`[0, 1, ... N-1]`). Do we want to add a warning for `dim_names not None` also?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1162739613


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,115 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+
+        Note: ``permutation`` should be non-trivial (``None`` or ``[0, 1, ..., len(shape)-1]``).

Review Comment:
   ```suggestion
           Note: ``permutation`` should be trivial (``None`` or ``[0, 1, ..., len(shape)-1]``).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1503605774

   > * For a specific use case, someone might not care about the permutation and just wants the physical C-contiguous array
   
   Wouldn't that not be possible with `tensor_ext_arr.storage.to_numpy()`? Or would that not be zero-copy?
   
   > * Because you want to do the handling of permutation / dim_names yourself, and therefore first want to get the numpy array as the physical array (see [GH-34882: [Python] Binding for FixedShapeTensorType #34883 (comment)](https://github.com/apache/arrow/pull/34883#issuecomment-1503146111) above for a possible reason for this, eg if you need to handle the potential combination of both permutation and/or dim_names, this might be easier to just all do yourself)
   
   How about we have pyarrow pass the permutation/strides information by default and have users use lower level API for manual handling of permutation / dim_names? I'm assuming underlying buffer doesn't get actually permuted in either case of course.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1495926046

   * Closes: #34882


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158216188


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1130,13 +1131,32 @@ def test_cpp_extension_in_python(tmpdir):
 
 
 def test_tensor_type():
-    tensor_type = pa.FixedShapeTensorType(pa.int8(), (2, 3))
+    tensor_type = pa.fixed_shape_tensor(pa.int8(), [2, 3])
     assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
     assert tensor_type.storage_type == pa.list_(pa.int8(), 6)
+    assert tensor_type.shape == [2, 3]
+    assert not tensor_type.dim_names
+    assert not tensor_type.permutation

Review Comment:
   I checked and you were of course correct to assume an empty list =)
   Will correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1158538747


##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1127,3 +1127,89 @@ def test_cpp_extension_in_python(tmpdir):
     reconstructed_array = batch.column(0)
     assert reconstructed_array.type == uuid_type
     assert reconstructed_array == array
+
+
+def test_tensor_type():
+    tensor_type = pa.fixed_shape_tensor(pa.int8(), [2, 3])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.int8(), 6)
+    assert tensor_type.shape == [2, 3]
+    assert tensor_type.dim_names is None
+    assert tensor_type.permutation is None
+
+    tensor_type = pa.fixed_shape_tensor(pa.float64(), [2, 2, 3],
+                                        permutation=[0, 2, 1])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.float64(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert tensor_type.dim_names is None
+    assert tensor_type.permutation == [0, 2, 1]
+
+    tensor_type = pa.fixed_shape_tensor(pa.bool_(), [2, 2, 3],
+                                        dim_names=['C', 'H', 'W'])
+    assert tensor_type.extension_name == "arrow.fixed_shape_tensor"
+    assert tensor_type.storage_type == pa.list_(pa.bool_(), 12)
+    assert tensor_type.shape == [2, 2, 3]
+    assert tensor_type.dim_names == ['C', 'H', 'W']
+    assert tensor_type.permutation is None
+
+
+@pytest.mark.parametrize("numpy_order", ('C', 'F'))
+def test_tensor_class_methods(numpy_order):
+    tensor_type = pa.fixed_shape_tensor(pa.float32(), [2, 3])
+    storage = pa.array([[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]],
+                       pa.list_(pa.float32(), 6))
+    arr = pa.ExtensionArray.from_storage(tensor_type, storage)
+    expected = np.array(
+        [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]], dtype=np.float32)
+
+    result = arr.to_numpy_ndarray()
+    np.testing.assert_array_equal(result, expected)
+
+    arr = np.array(
+        [[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]],
+        dtype=np.float32, order=numpy_order)
+    tensor_array_from_numpy = pa.FixedShapeTensorArray.from_numpy_ndarray(arr)
+    assert isinstance(tensor_array_from_numpy.type, pa.FixedShapeTensorType)
+    assert tensor_array_from_numpy.type.value_type == pa.float32()
+    assert tensor_array_from_numpy.type.shape == [2, 3]

Review Comment:
   Sorry, double checking the `flatten()` docs, this actually _always_ returns a copy, also for C contiguous arrays. What we actually want to use instead is `ravel("C")`, this will ensure to give a flat output looking at the original array as a row-major array, and only copy if needed (if the original array wasn't actually row major)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1160507428


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   > Ah, actually I need to error if `permutation[0] != 0`.
   
   No, the `self.type.permutation` is only that of the individual tensor elements (given that you have a FixedShapeTensorArray, by definition the first dimension of the n+1 dim ndarray is always the length of the array, and can't be permuted)
   
   > And if the permutation is for example `[0, 2, 1]` the user could still use the information about the permutation to rearrange the ordering after the reshape in `to_numpy_ndarray`.
   
   Yes, and as mentioned before, I think we should certainly add examples how the user can do this (if we decide to not automatically do it in `to_numpy_ndarray`)
   
   > Do we want to add a warning for dim_names not None also?
   
   I wouldn't raise a warning for that, since it's a fact of life that numpy arrays don't have dimension names, so that's just a consequence of calling this method. We can mention that in the docstring, though, to be explicit.
   
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1503210966

   @jorisvandenbossche if u agree we can go ahead and merge this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1162408578


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,113 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+
+        Note: The method doesn't take into account ``permutation`` and ``dim_names``
+        parameter therefore this information will be lost.
+        """

Review Comment:
   > Trivial permutation should be equal to None (no permutation), I think? Then we could just check for non None values?
   
   It can also be [0, 1, .., ndim-1]. It might be simpler to check for `self.type.permutation is None or self.type.permutation == list(range(self.type.ndim))`. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34883:
URL: https://github.com/apache/arrow/pull/34883#discussion_r1160461236


##########
python/pyarrow/array.pxi:
##########
@@ -3076,6 +3076,111 @@ cdef class ExtensionArray(Array):
         return Array._to_pandas(self.storage, options, **kwargs)
 
 
+class FixedShapeTensorArray(ExtensionArray):
+    """
+    Concrete class for fixed shape tensor extension arrays.
+
+    Examples
+    --------
+    Define the extension type for tensor array
+
+    >>> import pyarrow as pa
+    >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), [2, 2])
+
+    Create an extension array
+
+    >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+    >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+    >>> pa.ExtensionArray.from_storage(tensor_type, storage)
+    <pyarrow.lib.FixedShapeTensorArray object at ...>
+    [
+      [
+        1,
+        2,
+        3,
+        4
+      ],
+      [
+        10,
+        20,
+        30,
+        40
+      ],
+      [
+        100,
+        200,
+        300,
+        400
+      ]
+    ]
+    """
+
+    def to_numpy_ndarray(self):
+        """
+        Convert fixed shape tensor extension array to a numpy array (with dim+1).
+        """
+        np_flat = np.asarray(self.storage.values)
+        numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape),
+                                       order='C')

Review Comment:
   Ah, actually I need to error if `permutation[0] != 0`. And if the permutation is for example `[0, 2, 1]` the user could still use the information about the permutation to rearrange the ordering after the reshape in `to_numpy_ndarray`. Please correct me if I am wrong =)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #34883: GH-34882: [Python] Binding for FixedShapeTensorType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #34883:
URL: https://github.com/apache/arrow/pull/34883#issuecomment-1503616830

   Short interruption of the continuation of the thread 😊 
   
   Thank you Joris and Rok for all the help and reviews! 🙏 
   The docs for this PR are ready for review: https://github.com/apache/arrow/pull/34957


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org