You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Alenka Frim <al...@voltrondata.com.INVALID> on 2023/03/06 11:51:00 UTC

Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type

No problem Kevin. Thank you for sharing the information with your
colleagues.
All comments are much appreciated.

As there were no additional comments/suggestions to the spec itself, I will
open up another voting thread today.

Thanks all!
Alenka

On Tue, Feb 28, 2023 at 11:11 AM Kevin Gurney <kg...@mathworks.com> wrote:

> Hi Alenka,
>
> Thank you. I've informed my colleagues at MathWorks to add any further
> comments to the PR.
>
> My apologies for bringing this up on the voting thread.
>
> Best Regards,
>
> Kevin Gurney
>
> ________________________________
> From: Alenka Frim <al...@voltrondata.com.INVALID>
> Sent: Tuesday, February 28, 2023 4:19 AM
> To: dev@arrow.apache.org <de...@arrow.apache.org>
> Subject: Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type
>
> This was actually already meant as the voting thread, but given it sparked
> some more discussion, let's give this a few more days, and then re-start
> with a new vote thread.
>
> *So if someone still has comments on the current text, please bring those
> up here or in the PR*: https://github.com/apache/arrow/pull/33925<
> https://github.com/apache/arrow/pull/33925>.
>
> Alenka
>
> On Fri, Feb 24, 2023 at 10:15 AM Kevin Gurney <kg...@mathworks.com>
> wrote:
>
> > Hi All,
> >
> > Thank you very much for creating this proposal, Alenka!
> >
> > I noticed the following in the notes [1] shared from the February 15th
> > Arrow Community Meeting:
> >
> > "Members of Hugging Face, Ray, and PyTorch community have given input and
> > some of it was incorporated - It would be good to have input from some
> > other companies and project communities including Lance, NumPy, Posit,
> > ​MATLAB, DLPack, CUDA/RAPIDS, Arrow Rust, Xarray, Julia, Fortran,
> > TensorFlow, LinkedIn"
> >
> > Based on the inclusion of MATLAB in the list above, I've shared this
> > proposal with some colleagues at MathWorks who have expertise in the deep
> > learning area. They will respond here if they have any additional input
> to
> > add.
> >
> > That being said, I recognize that this proposal is already nearing the
> > voting phase.
> >
> > [1] https://lists.apache.org/thread/bblcwwq7gl1x2hsr1qsormv9f3vr23jn<
> https://lists.apache.org/thread/bblcwwq7gl1x2hsr1qsormv9f3vr23jn>
> >
> > Best Regards,
> >
> > Kevin Gurney
> >
> > ________________________________
> > From: Rok Mihevc <ro...@gmail.com>
> > Sent: Thursday, February 23, 2023 8:12 AM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type
> >
> > That makes sense indeed.
> > Do we have any more comments on the language of the proposal [1] or
> should
> > we proceed to vote?
> >
> > Rok
> >
> > [1] https://github.com/apache/arrow/pull/33925/files<
> https://github.com/apache/arrow/pull/33925/files><
> > https://github.com/apache/arrow/pull/33925/files<
> https://github.com/apache/arrow/pull/33925/files>>
> >
> > On Wed, Feb 22, 2023 at 2:13 PM Antoine Pitrou <an...@python.org>
> wrote:
> >
> > >
> > > That's a good point.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 22/02/2023 à 14:11, Dewey Dunnington a écrit :
> > > > I don't think having both dimension names and permutation is
> > > > redundant...dimension names can also serve as human-readable tags
> that
> > > help
> > > > a human interpret the values. If reading a NetCDF, for example, one
> > might
> > > > store the dimension variable names. When determining type equality it
> > may
> > > > be useful that {..., permutation = [2, 0, 1], dim_names = ["C", "H",
> > > "W"]}
> > > > is not equal to {..., permutation = [2, 0, 1], dim_names = ["x", "y",
> > > "z"]}.
> > > >
> > > > On Wed, Feb 22, 2023 at 4:56 AM Rok Mihevc <ro...@gmail.com>
> > wrote:
> > > >
> > > >>>
> > > >>>>>
> > > >>>>> Should we rule that `dim_names` and `permutation` are mutually
> > > >>> exclusive?
> > > >>>>>
> > > >>>>
> > > >>>> Since `dim_names` have to "map to the physical layout (row-major)"
> > > that
> > > >>>> means permutation will always be trivial which indeed makes it
> > > >>> unnecessary
> > > >>>> to store both.
> > > >>>
> > > >>> I don't think it is necessarily needed to explicitly make them
> > > >>> mutually exclusive. I don't know how useful this would in practice,
> > > >>> but you certainly *can* specify both in a meaningful way. Re-using
> > the
> > > >>> example of NHWC data, which is physically stored as NCHW, you can
> > keep
> > > >>> track of this by specifying a permutation of [2, 0, 1], but at the
> > > >>> same time you could also still save the dimension names as ["C",
> "H",
> > > >>> "W"].
> > > >>>
> > > >>
> > > >> I'll advocate for the original comment, but I'm ok either way.
> Having
> > > both
> > > >> `dim_names` and `permutation` is redundant - if the user knows their
> > > >> desired order of `dim_names` they can derive the permutation. If
> they
> > > don't
> > > >> use `dim_names` they probably don't want them.
> > > >>
> > > >
> > >
> >
>