You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Antoine Pitrou <an...@python.org> on 2021/03/30 09:35:56 UTC

New parquet-format release?

Hello,

The LZ4_RAW codec specification has been merged in parquet-format (*)
and should allow for better interoperability between implementations
compared to the current LZ4 codec (**).

(*) https://github.com/apache/parquet-format/pull/168

(**) see parquet-dev discussion on
https://mail-archives.apache.org/mod_mbox/parquet-dev/202102.mbox/%3C20210216151401.7647ce37%40fsol%3E

Before this new codec can have an implementation released, though, a
new release of parquet-format should be done.  What do you think?  Am I
right in assuming doing a release falls on the shoulders of the PMC?

Best regards

Antoine.



Re: New parquet-format release?

Posted by Gabor Szadovszky <ga...@apache.org>.
To be honest I did not have time to work on it. There are a couple things
to be finalized. I would like to list all the optional fields in the thrift
file that are practically required at least in certain cases.
We also have a couple of open questions:
- V1 vs V2 pages
<https://github.com/apache/parquet-format/pull/164/files#r592991912>
- unsigned integers
<https://github.com/apache/parquet-format/pull/164/files#r593002660>
- supported encodings
<https://github.com/apache/parquet-format/pull/164/files#r593026605>

Feel free to comment in the PR or if you think a topic requires a wider
audience we might start a separate discussion here in the dev list.
(Meanwhile, we still have the issue that this update heavily impacts
parquet implementations that may not be part of the parquet community.)

Cheers,
Gabor


On Sat, Apr 3, 2021 at 5:33 AM Micah Kornfield <em...@gmail.com>
wrote:

> >
> > "Core features" is clearly not in a shape to be finalized soon so we
> > can postpone it to the release after.
>
>
> What do we think we need to do to get it to a releasable state?
>
> On Tue, Mar 30, 2021 at 6:44 AM Gabor Szadovszky
> <ga...@cloudera.com.invalid> wrote:
>
> > Thanks a lot, Antoine for the summary and heads up. #166 is merged
> > already. The others do not seem to be crucial for the next release but
> > I am fine waiting a bit for the authors' response. (parquet-format
> > thrift bump is not really important because even though we are
> > releasing the generated java classes we are not using them in
> > parquet-mr so this is mainly a testing issue.)
> > "Core features" is clearly not in a shape to be finalized soon so we
> > can postpone it to the release after.
> >
> > Cheers,
> > Gabor
> >
> > On Tue, Mar 30, 2021 at 12:58 PM Antoine Pitrou <an...@python.org>
> > wrote:
> > >
> > >
> > > Hi Gabor,
> > >
> > > Ok, I went through the open PRs.  The following PR seem basically
> ready,
> > > just waiting for final feedback (and possible updates) from the
> > > submitters:
> > >
> > > * https://github.com/apache/parquet-format/pull/166
> > >   (PARQUET-1969: Migrate testing from Travis-CI to Github Actions)
> > >
> > > * https://github.com/apache/parquet-format/pull/158
> > >   (PARQUET-1779: Update merge script)
> > >
> > > The following PR needs polishing; I'll wait for feedback from the
> > > submitter and if there is none, will probably push an update myself:
> > >
> > > * https://github.com/apache/parquet-format/pull/162
> > >   (PARQUET-1930: Bump Apache Thrift to 0.13.0)
> > >
> > > Of the remaining PRs:
> > >
> > > * https://github.com/apache/parquet-format/pull/164 looks desirable
> but
> > >   is still in draft state, and I assume will require a bit more
> > >   massaging and/or a final agreement (and perhaps a formal vote?)
> > >   (PARQUET-1950: Define core features)
> > >
> > > * there are a couple of proposed format additions which don't seem to
> > >   have gathered a lot of interest, and are therefore most probably out
> > >   of scope for a forthcoming release
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > >
> > > On Tue, 30 Mar 2021 12:07:44 +0200
> > > Gabor Szadovszky <ga...@apache.org> wrote:
> > > > Hi Antoine,
> > > >
> > > > There are a couple of ongoing PRs in the parquet-format repo.
> However,
> > > > some may take very long (e.g. core features) but some are only
> waiting
> > > > for review (e.g. #166).
> > > > I agree that solving the current situation of LZ4 is worth a
> > > > parquet-format release but the ready PRs should also be included.
> > > >
> > > > Practically any committer can work on a release. (See
> > > > http://parquet.apache.org/documentation/how-to-release/ for
> details.)
> > > > As per the process PMC members are only required to vote on the
> > > > release.
> > > >
> > > > Regards,
> > > > Gabor
> > > >
> > >
> > >
> > >
> >
>

Re: New parquet-format release?

Posted by Micah Kornfield <em...@gmail.com>.
>
> "Core features" is clearly not in a shape to be finalized soon so we
> can postpone it to the release after.


What do we think we need to do to get it to a releasable state?

On Tue, Mar 30, 2021 at 6:44 AM Gabor Szadovszky
<ga...@cloudera.com.invalid> wrote:

> Thanks a lot, Antoine for the summary and heads up. #166 is merged
> already. The others do not seem to be crucial for the next release but
> I am fine waiting a bit for the authors' response. (parquet-format
> thrift bump is not really important because even though we are
> releasing the generated java classes we are not using them in
> parquet-mr so this is mainly a testing issue.)
> "Core features" is clearly not in a shape to be finalized soon so we
> can postpone it to the release after.
>
> Cheers,
> Gabor
>
> On Tue, Mar 30, 2021 at 12:58 PM Antoine Pitrou <an...@python.org>
> wrote:
> >
> >
> > Hi Gabor,
> >
> > Ok, I went through the open PRs.  The following PR seem basically ready,
> > just waiting for final feedback (and possible updates) from the
> > submitters:
> >
> > * https://github.com/apache/parquet-format/pull/166
> >   (PARQUET-1969: Migrate testing from Travis-CI to Github Actions)
> >
> > * https://github.com/apache/parquet-format/pull/158
> >   (PARQUET-1779: Update merge script)
> >
> > The following PR needs polishing; I'll wait for feedback from the
> > submitter and if there is none, will probably push an update myself:
> >
> > * https://github.com/apache/parquet-format/pull/162
> >   (PARQUET-1930: Bump Apache Thrift to 0.13.0)
> >
> > Of the remaining PRs:
> >
> > * https://github.com/apache/parquet-format/pull/164 looks desirable but
> >   is still in draft state, and I assume will require a bit more
> >   massaging and/or a final agreement (and perhaps a formal vote?)
> >   (PARQUET-1950: Define core features)
> >
> > * there are a couple of proposed format additions which don't seem to
> >   have gathered a lot of interest, and are therefore most probably out
> >   of scope for a forthcoming release
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
> > On Tue, 30 Mar 2021 12:07:44 +0200
> > Gabor Szadovszky <ga...@apache.org> wrote:
> > > Hi Antoine,
> > >
> > > There are a couple of ongoing PRs in the parquet-format repo. However,
> > > some may take very long (e.g. core features) but some are only waiting
> > > for review (e.g. #166).
> > > I agree that solving the current situation of LZ4 is worth a
> > > parquet-format release but the ready PRs should also be included.
> > >
> > > Practically any committer can work on a release. (See
> > > http://parquet.apache.org/documentation/how-to-release/ for details.)
> > > As per the process PMC members are only required to vote on the
> > > release.
> > >
> > > Regards,
> > > Gabor
> > >
> >
> >
> >
>

Re: New parquet-format release?

Posted by Gabor Szadovszky <ga...@cloudera.com.INVALID>.
Thanks a lot, Antoine for the summary and heads up. #166 is merged
already. The others do not seem to be crucial for the next release but
I am fine waiting a bit for the authors' response. (parquet-format
thrift bump is not really important because even though we are
releasing the generated java classes we are not using them in
parquet-mr so this is mainly a testing issue.)
"Core features" is clearly not in a shape to be finalized soon so we
can postpone it to the release after.

Cheers,
Gabor

On Tue, Mar 30, 2021 at 12:58 PM Antoine Pitrou <an...@python.org> wrote:
>
>
> Hi Gabor,
>
> Ok, I went through the open PRs.  The following PR seem basically ready,
> just waiting for final feedback (and possible updates) from the
> submitters:
>
> * https://github.com/apache/parquet-format/pull/166
>   (PARQUET-1969: Migrate testing from Travis-CI to Github Actions)
>
> * https://github.com/apache/parquet-format/pull/158
>   (PARQUET-1779: Update merge script)
>
> The following PR needs polishing; I'll wait for feedback from the
> submitter and if there is none, will probably push an update myself:
>
> * https://github.com/apache/parquet-format/pull/162
>   (PARQUET-1930: Bump Apache Thrift to 0.13.0)
>
> Of the remaining PRs:
>
> * https://github.com/apache/parquet-format/pull/164 looks desirable but
>   is still in draft state, and I assume will require a bit more
>   massaging and/or a final agreement (and perhaps a formal vote?)
>   (PARQUET-1950: Define core features)
>
> * there are a couple of proposed format additions which don't seem to
>   have gathered a lot of interest, and are therefore most probably out
>   of scope for a forthcoming release
>
> Regards
>
> Antoine.
>
>
>
> On Tue, 30 Mar 2021 12:07:44 +0200
> Gabor Szadovszky <ga...@apache.org> wrote:
> > Hi Antoine,
> >
> > There are a couple of ongoing PRs in the parquet-format repo. However,
> > some may take very long (e.g. core features) but some are only waiting
> > for review (e.g. #166).
> > I agree that solving the current situation of LZ4 is worth a
> > parquet-format release but the ready PRs should also be included.
> >
> > Practically any committer can work on a release. (See
> > http://parquet.apache.org/documentation/how-to-release/ for details.)
> > As per the process PMC members are only required to vote on the
> > release.
> >
> > Regards,
> > Gabor
> >
>
>
>

Re: New parquet-format release?

Posted by Antoine Pitrou <an...@python.org>.
Hi Gabor,

Ok, I went through the open PRs.  The following PR seem basically ready,
just waiting for final feedback (and possible updates) from the
submitters:

* https://github.com/apache/parquet-format/pull/166
  (PARQUET-1969: Migrate testing from Travis-CI to Github Actions)

* https://github.com/apache/parquet-format/pull/158
  (PARQUET-1779: Update merge script)

The following PR needs polishing; I'll wait for feedback from the
submitter and if there is none, will probably push an update myself:

* https://github.com/apache/parquet-format/pull/162
  (PARQUET-1930: Bump Apache Thrift to 0.13.0)

Of the remaining PRs:

* https://github.com/apache/parquet-format/pull/164 looks desirable but
  is still in draft state, and I assume will require a bit more
  massaging and/or a final agreement (and perhaps a formal vote?)
  (PARQUET-1950: Define core features)

* there are a couple of proposed format additions which don't seem to
  have gathered a lot of interest, and are therefore most probably out
  of scope for a forthcoming release

Regards

Antoine.



On Tue, 30 Mar 2021 12:07:44 +0200
Gabor Szadovszky <ga...@apache.org> wrote:
> Hi Antoine,
> 
> There are a couple of ongoing PRs in the parquet-format repo. However,
> some may take very long (e.g. core features) but some are only waiting
> for review (e.g. #166).
> I agree that solving the current situation of LZ4 is worth a
> parquet-format release but the ready PRs should also be included.
> 
> Practically any committer can work on a release. (See
> http://parquet.apache.org/documentation/how-to-release/ for details.)
> As per the process PMC members are only required to vote on the
> release.
> 
> Regards,
> Gabor
> 




Re: New parquet-format release?

Posted by Gabor Szadovszky <ga...@apache.org>.
Hi Antoine,

There are a couple of ongoing PRs in the parquet-format repo. However,
some may take very long (e.g. core features) but some are only waiting
for review (e.g. #166).
I agree that solving the current situation of LZ4 is worth a
parquet-format release but the ready PRs should also be included.

Practically any committer can work on a release. (See
http://parquet.apache.org/documentation/how-to-release/ for details.)
As per the process PMC members are only required to vote on the
release.

Regards,
Gabor