You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Christian Hudon (Jira)" <ji...@apache.org> on 2020/05/05 20:45:00 UTC

[jira] [Commented] (ARROW-1614) [C++] Add a Tensor logical value type implemented using ExtensionType

    [ https://issues.apache.org/jira/browse/ARROW-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100249#comment-17100249 ] 

Christian Hudon commented on ARROW-1614:
----------------------------------------

This is a blocker for some Arrow use cases for us, so I'd be willing to work on this, with a bit of guidance. The first step would be to agree on the approach to take.

For me, there are two cases I'd need Arrow to support:
 # Each row is a tensor of a different shape (e.g images of different sizes), but of the same underlying type (e.g. int32). I don't see needing each row being a tensor with a different number of dimensions, so that could be out of scope if desired.
 # All rows have the same shape (so the whole column could potentially be handed off to e.g. scikit-image, as an array of n images of the same size).

From what I understand of Arrow, here's how I would implement this:
 # A first column containing the elements from all the tensors (in row-major order), and a second containing a tuple with that tensor's shape. The start offset of the data for the next tensor can be computed from the shape of the previous one. (Would also need a separate column containing the pre-computed start index of for each tensor?)
 # Similarly, the data from the tensors would be stored all together in row-major order. The shape (without the first dimension) would be store in the metadata for that column.

Thoughts?

> [C++] Add a Tensor logical value type implemented using ExtensionType
> ---------------------------------------------------------------------
>
>                 Key: ARROW-1614
>                 URL: https://issues.apache.org/jira/browse/ARROW-1614
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Format
>            Reporter: Wes McKinney
>            Priority: Major
>
> In an Arrow table, conceivably a column could have values cells each containing a tensor value of some size (a binary value plus some metadata to store type and shape/strides)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)