You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by huaxin gao <hu...@gmail.com> on 2022/02/11 04:00:38 UTC
Parquet Column Resolution by ID
Hi Parquet community,
Xinli and I drafted a design doc to support ID based column resolution in
Parquet. Here is the link
<https://docs.google.com/document/d/1hDLFIKuVhhnTNpA5bTo4nfD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing>.
We'd like to start a discussion on the doc and any feedback is welcome!
Thanks,
Huaxin
Re: Parquet Column Resolution by ID
Posted by Jorge Cardoso Leitão <jo...@gmail.com>.
Hi,
Thanks for the write-up!
Two questions:
* AFAIK most implementations identify which columns belong to a (nested)
field via the schema in path. (i.e. given field "a", give me all the
columns that are part of that field, e.g. "a.b.c", "a.d", etc.). How would
that work with field ids?
* The change
> With the support of column id resolution, the column ids must be unique
in the entire Parquet schema in order to identify a column correctly. In
the write path, an Exception will be thrown if the ids are not unique
Is backward incompatible? Could it make sense to rephrase it as:
* Writers MAY write a unique column id per field in order to identify a
column irrespectively of its name (e.g. column renames)
* If a reader identifies that a parquet file has unique column ids, it MAY
use column ids to identify columns (ignoring the column name).
This may be backward compatible and makes it an opt-in feature.
Best,
Jorge
On Fri, Feb 11, 2022 at 5:01 AM huaxin gao <hu...@gmail.com> wrote:
> Hi Parquet community,
>
> Xinli and I drafted a design doc to support ID based column resolution in
> Parquet. Here is the link
> <
> https://docs.google.com/document/d/1hDLFIKuVhhnTNpA5bTo4nfD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing
> >.
> We'd like to start a discussion on the doc and any feedback is welcome!
>
> Thanks,
> Huaxin
>
Re: Parquet Column Resolution by ID
Posted by Gidon Gershinsky <gg...@gmail.com>.
Thanks Xinli, works well now. I've reviewed the doc.
Cheers, Gidon
On Fri, Feb 11, 2022 at 7:21 PM Xinli shang <sh...@uber.com.invalid> wrote:
> Hi Gidon,
>
> I just shared the 'comment' permission for everybody. Let me know if you
> still have issues with it.
>
> Xinli
>
> On Thu, Feb 10, 2022 at 9:45 PM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> > Hi Huaxin,
> >
> > Can you open this document for comments?
> >
> > Cheers, Gidon
> >
> >
> > On Fri, Feb 11, 2022 at 6:01 AM huaxin gao <hu...@gmail.com>
> wrote:
> >
> > > Hi Parquet community,
> > >
> > > Xinli and I drafted a design doc to support ID based column resolution
> in
> > > Parquet. Here is the link
> > > <
> > >
> >
> https://docs.google.com/document/d/1hDLFIKuVhhnTNpA5bTo4nfD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing
> > > >.
> > > We'd like to start a discussion on the doc and any feedback is welcome!
> > >
> > > Thanks,
> > > Huaxin
> > >
> >
>
>
> --
> Xinli Shang
>
Re: Parquet Column Resolution by ID
Posted by Xinli shang <sh...@uber.com.INVALID>.
Hi Gidon,
I just shared the 'comment' permission for everybody. Let me know if you
still have issues with it.
Xinli
On Thu, Feb 10, 2022 at 9:45 PM Gidon Gershinsky <gg...@gmail.com> wrote:
> Hi Huaxin,
>
> Can you open this document for comments?
>
> Cheers, Gidon
>
>
> On Fri, Feb 11, 2022 at 6:01 AM huaxin gao <hu...@gmail.com> wrote:
>
> > Hi Parquet community,
> >
> > Xinli and I drafted a design doc to support ID based column resolution in
> > Parquet. Here is the link
> > <
> >
> https://docs.google.com/document/d/1hDLFIKuVhhnTNpA5bTo4nfD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing
> > >.
> > We'd like to start a discussion on the doc and any feedback is welcome!
> >
> > Thanks,
> > Huaxin
> >
>
--
Xinli Shang
Re: Parquet Column Resolution by ID
Posted by Gidon Gershinsky <gg...@gmail.com>.
Hi Huaxin,
Can you open this document for comments?
Cheers, Gidon
On Fri, Feb 11, 2022 at 6:01 AM huaxin gao <hu...@gmail.com> wrote:
> Hi Parquet community,
>
> Xinli and I drafted a design doc to support ID based column resolution in
> Parquet. Here is the link
> <
> https://docs.google.com/document/d/1hDLFIKuVhhnTNpA5bTo4nfD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing
> >.
> We'd like to start a discussion on the doc and any feedback is welcome!
>
> Thanks,
> Huaxin
>