You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by huaxin gao <hu...@gmail.com> on 2022/02/11 04:00:38 UTC

Parquet Column Resolution by ID

Hi Parquet community,

Xinli and I drafted a design doc to support ID based column resolution in
Parquet. Here is the link
<https://docs.google.com/document/d/1hDLFIKuVhhnTNpA5bTo4nfD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing>.
We'd like to start a discussion on the doc and any feedback is welcome!

Thanks,
Huaxin

Re: Parquet Column Resolution by ID

Posted by Jorge Cardoso Leitão <jo...@gmail.com>.
Hi,

Thanks for the write-up!

Two questions:

* AFAIK most implementations identify which columns belong to a (nested)
field via the schema in path. (i.e. given field "a", give me all the
columns that are part of that field, e.g. "a.b.c", "a.d", etc.). How would
that work with field ids?

* The change

> With the support of column id resolution, the column ids must be unique
in the entire Parquet schema in order to identify a column correctly. In
the write path, an Exception will be thrown if the ids are not unique

Is backward incompatible? Could it make sense to rephrase it as:

* Writers MAY write a unique column id per field in order to identify a
column irrespectively of its name (e.g. column renames)
* If a reader identifies that a parquet file has unique column ids, it MAY
use column ids to identify columns (ignoring the column name).

This may be backward compatible and makes it an opt-in feature.

Best,
Jorge




On Fri, Feb 11, 2022 at 5:01 AM huaxin gao <hu...@gmail.com> wrote:

> Hi Parquet community,
>
> Xinli and I drafted a design doc to support ID based column resolution in
> Parquet. Here is the link
> <
> https://docs.google.com/document/d/1hDLFIKuVhhnTNpA5bTo4nfD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing
> >.
> We'd like to start a discussion on the doc and any feedback is welcome!
>
> Thanks,
> Huaxin
>

Re: Parquet Column Resolution by ID

Posted by Gidon Gershinsky <gg...@gmail.com>.
Thanks Xinli, works well now. I've reviewed the doc.

Cheers, Gidon


On Fri, Feb 11, 2022 at 7:21 PM Xinli shang <sh...@uber.com.invalid> wrote:

> Hi Gidon,
>
> I just shared the 'comment' permission for everybody. Let me know if you
> still have issues with it.
>
> Xinli
>
> On Thu, Feb 10, 2022 at 9:45 PM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> > Hi Huaxin,
> >
> > Can you open this document for comments?
> >
> > Cheers, Gidon
> >
> >
> > On Fri, Feb 11, 2022 at 6:01 AM huaxin gao <hu...@gmail.com>
> wrote:
> >
> > > Hi Parquet community,
> > >
> > > Xinli and I drafted a design doc to support ID based column resolution
> in
> > > Parquet. Here is the link
> > > <
> > >
> >
> https://docs.google.com/document/d/1hDLFIKuVhhnTNpA5bTo4nfD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing
> > > >.
> > > We'd like to start a discussion on the doc and any feedback is welcome!
> > >
> > > Thanks,
> > > Huaxin
> > >
> >
>
>
> --
> Xinli Shang
>

Re: Parquet Column Resolution by ID

Posted by Xinli shang <sh...@uber.com.INVALID>.
Hi Gidon,

I just shared the 'comment' permission for everybody. Let me know if you
still have issues with it.

Xinli

On Thu, Feb 10, 2022 at 9:45 PM Gidon Gershinsky <gg...@gmail.com> wrote:

> Hi Huaxin,
>
> Can you open this document for comments?
>
> Cheers, Gidon
>
>
> On Fri, Feb 11, 2022 at 6:01 AM huaxin gao <hu...@gmail.com> wrote:
>
> > Hi Parquet community,
> >
> > Xinli and I drafted a design doc to support ID based column resolution in
> > Parquet. Here is the link
> > <
> >
> https://docs.google.com/document/d/1hDLFIKuVhhnTNpA5bTo4nfD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing
> > >.
> > We'd like to start a discussion on the doc and any feedback is welcome!
> >
> > Thanks,
> > Huaxin
> >
>


-- 
Xinli Shang

Re: Parquet Column Resolution by ID

Posted by Gidon Gershinsky <gg...@gmail.com>.
Hi Huaxin,

Can you open this document for comments?

Cheers, Gidon


On Fri, Feb 11, 2022 at 6:01 AM huaxin gao <hu...@gmail.com> wrote:

> Hi Parquet community,
>
> Xinli and I drafted a design doc to support ID based column resolution in
> Parquet. Here is the link
> <
> https://docs.google.com/document/d/1hDLFIKuVhhnTNpA5bTo4nfD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing
> >.
> We'd like to start a discussion on the doc and any feedback is welcome!
>
> Thanks,
> Huaxin
>