You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Gang Wu <us...@gmail.com> on 2023/05/02 03:33:05 UTC

Re: [DISCUSS] Time to release parquet format 2.10.0?

Thanks Fokko!

Let us just wait for more inputs to see if it is good to proceed.

Best,
Gang

On Fri, Apr 28, 2023 at 4:05 PM Fokko Driesprong <fo...@apache.org> wrote:

> Hey Gang,
>
> Great bringing this up, I think that would be a great idea!
>
> Kind regards,
> Fokko
>
> Op do 27 apr 2023 om 09:52 schreef Gang Wu <us...@gmail.com>:
>
> > Hi,
> >
> > The latest parquet format is v2.9.0 [1] which was released two years ago.
> > Is it a good time to release the next version? If there is no objection,
> I
> > can
> > volunteer to be the release manager.
> >
> > [1] https://github.com/apache/parquet-format/blob/master/CHANGES.md
> >
> > Best,
> > Gang
> >
>

Re: [DISCUSS] Time to release parquet format 2.10.0?

Posted by Gang Wu <us...@gmail.com>.
Hi all,

Now that we have merged PARQUET-758 [1] and PARQUET-2261 [2], I
think it is a good time to move forward with the v2.10 release process. I
do notice that there is an ongoing effort with PARQUET-2249 [3]. Due to
the current status, I do not think it will be closed too soon. If there is
no
objection, I volunteer to be the release manager and go ahead.

[1] https://issues.apache.org/jira/browse/PARQUET-758
[2] https://issues.apache.org/jira/browse/PARQUET-2261
[3] https://issues.apache.org/jira/browse/PARQUET-2249

Thanks,
Gang

On Mon, Jul 17, 2023 at 2:10 PM Gang Wu <us...@gmail.com> wrote:

> I probably don't have much bandwidth to work on it this month. BTW,
> a POC implementation in the parquet-mr is also required, right? I can
> start to work on this but cannot provide a precise ETA yet.
>
> Best,
> Gang
>
> On Mon, Jul 17, 2023 at 12:17 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
>> I'm sorry I've had less time to dedicate to this then I inspect.  Gang do
>> you have bandwidth to work on it?  I can help review.  Otherwise, will see
>> if I can make time this month.
>>
>> On Sat, May 13, 2023 at 10:53 AM Xinli shang <sh...@uber.com.invalid>
>> wrote:
>>
>> > Thank Gang for taking the lead on this! I agree we should have a new
>> > release. In addition to PARQUET-2261, there was also a discussion in Feb
>> > with PMCs for PARQUET-758. We may want to check for the plan with
>> Antoine
>> > Pitrou <https://github.com/pitrou> if PARQUET-758 wants to be in also.
>> >
>> >
>> >
>> > On Sat, May 13, 2023 at 9:51 AM Micah Kornfield <em...@gmail.com>
>> > wrote:
>> >
>> > > >
>> > > >  BTW, I'd like to see the implementation from Micah to fully
>> > > > understand the use case. If he is too busy to do that, I can do it
>> > based
>> > > on
>> > > > my understanding.
>> > >
>> > >
>> > > I can allocate some time to try to make a PoC in C++ next month if we
>> are
>> > > willing to wait until then.
>> > >
>> > > On Fri, May 12, 2023 at 5:04 AM Gang Wu <us...@gmail.com> wrote:
>> > >
>> > > > I think we can wait for a complete PoC implementation of
>> PARQUET-2261
>> > > > before release. BTW, I'd like to see the implementation from Micah
>> to
>> > > fully
>> > > > understand the use case. If he is too busy to do that, I can do it
>> > based
>> > > on
>> > > > my understanding.
>> > > >
>> > > > Best,
>> > > > Gang
>> > > >
>> > > > On Fri, May 12, 2023 at 4:34 PM Gábor Szádovszky <ga...@apache.org>
>> > > wrote:
>> > > >
>> > > > > Thanks a lot for volunteering, Gang!
>> > > > >
>> > > > > However it is more than 2 years indeed since the last release I
>> think
>> > > the
>> > > > > actual changes since then are more important. There are lots of
>> > > > > additions/corrections in the spec docs and the thrift file
>> comments
>> > > which
>> > > > > are very important but not tightly attached to a format release. I
>> > only
>> > > > can
>> > > > > see PARQUET-2257 that contains an actual change in the thrift
>> > > structure.
>> > > > >
>> > > > > Related to the ongoing effort of PARQUET-2261: I think, we are
>> > waiting
>> > > > for
>> > > > > a PoC implementation. @emkornfield: Do you plan to work on this?
>> > > > >
>> > > > > The question is if we think PARQUET-2257 is urgent enough to not
>> to
>> > > wait
>> > > > > for PARQUET-2261 and have an additional release after the latter
>> is
>> > > ready
>> > > > > or we shall wait for the PoC implementation and release format
>> after
>> > > it.
>> > > > >
>> > > > > On 2023/05/02 03:33:05 Gang Wu wrote:
>> > > > > > Thanks Fokko!
>> > > > > >
>> > > > > > Let us just wait for more inputs to see if it is good to
>> proceed.
>> > > > > >
>> > > > > > Best,
>> > > > > > Gang
>> > > > > >
>> > > > > > On Fri, Apr 28, 2023 at 4:05 PM Fokko Driesprong <
>> fokko@apache.org
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hey Gang,
>> > > > > > >
>> > > > > > > Great bringing this up, I think that would be a great idea!
>> > > > > > >
>> > > > > > > Kind regards,
>> > > > > > > Fokko
>> > > > > > >
>> > > > > > > Op do 27 apr 2023 om 09:52 schreef Gang Wu <ustcwg@gmail.com
>> >:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > The latest parquet format is v2.9.0 [1] which was released
>> two
>> > > > years
>> > > > > ago.
>> > > > > > > > Is it a good time to release the next version? If there is
>> no
>> > > > > objection,
>> > > > > > > I
>> > > > > > > > can
>> > > > > > > > volunteer to be the release manager.
>> > > > > > > >
>> > > > > > > > [1]
>> > > > https://github.com/apache/parquet-format/blob/master/CHANGES.md
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Gang
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> >
>> > --
>> > Xinli Shang
>> >
>>
>

Re: [DISCUSS] Time to release parquet format 2.10.0?

Posted by Gang Wu <us...@gmail.com>.
I probably don't have much bandwidth to work on it this month. BTW,
a POC implementation in the parquet-mr is also required, right? I can
start to work on this but cannot provide a precise ETA yet.

Best,
Gang

On Mon, Jul 17, 2023 at 12:17 PM Micah Kornfield <em...@gmail.com>
wrote:

> I'm sorry I've had less time to dedicate to this then I inspect.  Gang do
> you have bandwidth to work on it?  I can help review.  Otherwise, will see
> if I can make time this month.
>
> On Sat, May 13, 2023 at 10:53 AM Xinli shang <sh...@uber.com.invalid>
> wrote:
>
> > Thank Gang for taking the lead on this! I agree we should have a new
> > release. In addition to PARQUET-2261, there was also a discussion in Feb
> > with PMCs for PARQUET-758. We may want to check for the plan with Antoine
> > Pitrou <https://github.com/pitrou> if PARQUET-758 wants to be in also.
> >
> >
> >
> > On Sat, May 13, 2023 at 9:51 AM Micah Kornfield <em...@gmail.com>
> > wrote:
> >
> > > >
> > > >  BTW, I'd like to see the implementation from Micah to fully
> > > > understand the use case. If he is too busy to do that, I can do it
> > based
> > > on
> > > > my understanding.
> > >
> > >
> > > I can allocate some time to try to make a PoC in C++ next month if we
> are
> > > willing to wait until then.
> > >
> > > On Fri, May 12, 2023 at 5:04 AM Gang Wu <us...@gmail.com> wrote:
> > >
> > > > I think we can wait for a complete PoC implementation of PARQUET-2261
> > > > before release. BTW, I'd like to see the implementation from Micah to
> > > fully
> > > > understand the use case. If he is too busy to do that, I can do it
> > based
> > > on
> > > > my understanding.
> > > >
> > > > Best,
> > > > Gang
> > > >
> > > > On Fri, May 12, 2023 at 4:34 PM Gábor Szádovszky <ga...@apache.org>
> > > wrote:
> > > >
> > > > > Thanks a lot for volunteering, Gang!
> > > > >
> > > > > However it is more than 2 years indeed since the last release I
> think
> > > the
> > > > > actual changes since then are more important. There are lots of
> > > > > additions/corrections in the spec docs and the thrift file comments
> > > which
> > > > > are very important but not tightly attached to a format release. I
> > only
> > > > can
> > > > > see PARQUET-2257 that contains an actual change in the thrift
> > > structure.
> > > > >
> > > > > Related to the ongoing effort of PARQUET-2261: I think, we are
> > waiting
> > > > for
> > > > > a PoC implementation. @emkornfield: Do you plan to work on this?
> > > > >
> > > > > The question is if we think PARQUET-2257 is urgent enough to not to
> > > wait
> > > > > for PARQUET-2261 and have an additional release after the latter is
> > > ready
> > > > > or we shall wait for the PoC implementation and release format
> after
> > > it.
> > > > >
> > > > > On 2023/05/02 03:33:05 Gang Wu wrote:
> > > > > > Thanks Fokko!
> > > > > >
> > > > > > Let us just wait for more inputs to see if it is good to proceed.
> > > > > >
> > > > > > Best,
> > > > > > Gang
> > > > > >
> > > > > > On Fri, Apr 28, 2023 at 4:05 PM Fokko Driesprong <
> fokko@apache.org
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hey Gang,
> > > > > > >
> > > > > > > Great bringing this up, I think that would be a great idea!
> > > > > > >
> > > > > > > Kind regards,
> > > > > > > Fokko
> > > > > > >
> > > > > > > Op do 27 apr 2023 om 09:52 schreef Gang Wu <us...@gmail.com>:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > The latest parquet format is v2.9.0 [1] which was released
> two
> > > > years
> > > > > ago.
> > > > > > > > Is it a good time to release the next version? If there is no
> > > > > objection,
> > > > > > > I
> > > > > > > > can
> > > > > > > > volunteer to be the release manager.
> > > > > > > >
> > > > > > > > [1]
> > > > https://github.com/apache/parquet-format/blob/master/CHANGES.md
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Gang
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Xinli Shang
> >
>

Re: [DISCUSS] Time to release parquet format 2.10.0?

Posted by Micah Kornfield <em...@gmail.com>.
I'm sorry I've had less time to dedicate to this then I inspect.  Gang do
you have bandwidth to work on it?  I can help review.  Otherwise, will see
if I can make time this month.

On Sat, May 13, 2023 at 10:53 AM Xinli shang <sh...@uber.com.invalid>
wrote:

> Thank Gang for taking the lead on this! I agree we should have a new
> release. In addition to PARQUET-2261, there was also a discussion in Feb
> with PMCs for PARQUET-758. We may want to check for the plan with Antoine
> Pitrou <https://github.com/pitrou> if PARQUET-758 wants to be in also.
>
>
>
> On Sat, May 13, 2023 at 9:51 AM Micah Kornfield <em...@gmail.com>
> wrote:
>
> > >
> > >  BTW, I'd like to see the implementation from Micah to fully
> > > understand the use case. If he is too busy to do that, I can do it
> based
> > on
> > > my understanding.
> >
> >
> > I can allocate some time to try to make a PoC in C++ next month if we are
> > willing to wait until then.
> >
> > On Fri, May 12, 2023 at 5:04 AM Gang Wu <us...@gmail.com> wrote:
> >
> > > I think we can wait for a complete PoC implementation of PARQUET-2261
> > > before release. BTW, I'd like to see the implementation from Micah to
> > fully
> > > understand the use case. If he is too busy to do that, I can do it
> based
> > on
> > > my understanding.
> > >
> > > Best,
> > > Gang
> > >
> > > On Fri, May 12, 2023 at 4:34 PM Gábor Szádovszky <ga...@apache.org>
> > wrote:
> > >
> > > > Thanks a lot for volunteering, Gang!
> > > >
> > > > However it is more than 2 years indeed since the last release I think
> > the
> > > > actual changes since then are more important. There are lots of
> > > > additions/corrections in the spec docs and the thrift file comments
> > which
> > > > are very important but not tightly attached to a format release. I
> only
> > > can
> > > > see PARQUET-2257 that contains an actual change in the thrift
> > structure.
> > > >
> > > > Related to the ongoing effort of PARQUET-2261: I think, we are
> waiting
> > > for
> > > > a PoC implementation. @emkornfield: Do you plan to work on this?
> > > >
> > > > The question is if we think PARQUET-2257 is urgent enough to not to
> > wait
> > > > for PARQUET-2261 and have an additional release after the latter is
> > ready
> > > > or we shall wait for the PoC implementation and release format after
> > it.
> > > >
> > > > On 2023/05/02 03:33:05 Gang Wu wrote:
> > > > > Thanks Fokko!
> > > > >
> > > > > Let us just wait for more inputs to see if it is good to proceed.
> > > > >
> > > > > Best,
> > > > > Gang
> > > > >
> > > > > On Fri, Apr 28, 2023 at 4:05 PM Fokko Driesprong <fokko@apache.org
> >
> > > > wrote:
> > > > >
> > > > > > Hey Gang,
> > > > > >
> > > > > > Great bringing this up, I think that would be a great idea!
> > > > > >
> > > > > > Kind regards,
> > > > > > Fokko
> > > > > >
> > > > > > Op do 27 apr 2023 om 09:52 schreef Gang Wu <us...@gmail.com>:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > The latest parquet format is v2.9.0 [1] which was released two
> > > years
> > > > ago.
> > > > > > > Is it a good time to release the next version? If there is no
> > > > objection,
> > > > > > I
> > > > > > > can
> > > > > > > volunteer to be the release manager.
> > > > > > >
> > > > > > > [1]
> > > https://github.com/apache/parquet-format/blob/master/CHANGES.md
> > > > > > >
> > > > > > > Best,
> > > > > > > Gang
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Xinli Shang
>

Re: [DISCUSS] Time to release parquet format 2.10.0?

Posted by Xinli shang <sh...@uber.com.INVALID>.
Thank Gang for taking the lead on this! I agree we should have a new
release. In addition to PARQUET-2261, there was also a discussion in Feb
with PMCs for PARQUET-758. We may want to check for the plan with Antoine
Pitrou <https://github.com/pitrou> if PARQUET-758 wants to be in also.



On Sat, May 13, 2023 at 9:51 AM Micah Kornfield <em...@gmail.com>
wrote:

> >
> >  BTW, I'd like to see the implementation from Micah to fully
> > understand the use case. If he is too busy to do that, I can do it based
> on
> > my understanding.
>
>
> I can allocate some time to try to make a PoC in C++ next month if we are
> willing to wait until then.
>
> On Fri, May 12, 2023 at 5:04 AM Gang Wu <us...@gmail.com> wrote:
>
> > I think we can wait for a complete PoC implementation of PARQUET-2261
> > before release. BTW, I'd like to see the implementation from Micah to
> fully
> > understand the use case. If he is too busy to do that, I can do it based
> on
> > my understanding.
> >
> > Best,
> > Gang
> >
> > On Fri, May 12, 2023 at 4:34 PM Gábor Szádovszky <ga...@apache.org>
> wrote:
> >
> > > Thanks a lot for volunteering, Gang!
> > >
> > > However it is more than 2 years indeed since the last release I think
> the
> > > actual changes since then are more important. There are lots of
> > > additions/corrections in the spec docs and the thrift file comments
> which
> > > are very important but not tightly attached to a format release. I only
> > can
> > > see PARQUET-2257 that contains an actual change in the thrift
> structure.
> > >
> > > Related to the ongoing effort of PARQUET-2261: I think, we are waiting
> > for
> > > a PoC implementation. @emkornfield: Do you plan to work on this?
> > >
> > > The question is if we think PARQUET-2257 is urgent enough to not to
> wait
> > > for PARQUET-2261 and have an additional release after the latter is
> ready
> > > or we shall wait for the PoC implementation and release format after
> it.
> > >
> > > On 2023/05/02 03:33:05 Gang Wu wrote:
> > > > Thanks Fokko!
> > > >
> > > > Let us just wait for more inputs to see if it is good to proceed.
> > > >
> > > > Best,
> > > > Gang
> > > >
> > > > On Fri, Apr 28, 2023 at 4:05 PM Fokko Driesprong <fo...@apache.org>
> > > wrote:
> > > >
> > > > > Hey Gang,
> > > > >
> > > > > Great bringing this up, I think that would be a great idea!
> > > > >
> > > > > Kind regards,
> > > > > Fokko
> > > > >
> > > > > Op do 27 apr 2023 om 09:52 schreef Gang Wu <us...@gmail.com>:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > The latest parquet format is v2.9.0 [1] which was released two
> > years
> > > ago.
> > > > > > Is it a good time to release the next version? If there is no
> > > objection,
> > > > > I
> > > > > > can
> > > > > > volunteer to be the release manager.
> > > > > >
> > > > > > [1]
> > https://github.com/apache/parquet-format/blob/master/CHANGES.md
> > > > > >
> > > > > > Best,
> > > > > > Gang
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Xinli Shang

Re: [DISCUSS] Time to release parquet format 2.10.0?

Posted by Micah Kornfield <em...@gmail.com>.
>
>  BTW, I'd like to see the implementation from Micah to fully
> understand the use case. If he is too busy to do that, I can do it based on
> my understanding.


I can allocate some time to try to make a PoC in C++ next month if we are
willing to wait until then.

On Fri, May 12, 2023 at 5:04 AM Gang Wu <us...@gmail.com> wrote:

> I think we can wait for a complete PoC implementation of PARQUET-2261
> before release. BTW, I'd like to see the implementation from Micah to fully
> understand the use case. If he is too busy to do that, I can do it based on
> my understanding.
>
> Best,
> Gang
>
> On Fri, May 12, 2023 at 4:34 PM Gábor Szádovszky <ga...@apache.org> wrote:
>
> > Thanks a lot for volunteering, Gang!
> >
> > However it is more than 2 years indeed since the last release I think the
> > actual changes since then are more important. There are lots of
> > additions/corrections in the spec docs and the thrift file comments which
> > are very important but not tightly attached to a format release. I only
> can
> > see PARQUET-2257 that contains an actual change in the thrift structure.
> >
> > Related to the ongoing effort of PARQUET-2261: I think, we are waiting
> for
> > a PoC implementation. @emkornfield: Do you plan to work on this?
> >
> > The question is if we think PARQUET-2257 is urgent enough to not to wait
> > for PARQUET-2261 and have an additional release after the latter is ready
> > or we shall wait for the PoC implementation and release format after it.
> >
> > On 2023/05/02 03:33:05 Gang Wu wrote:
> > > Thanks Fokko!
> > >
> > > Let us just wait for more inputs to see if it is good to proceed.
> > >
> > > Best,
> > > Gang
> > >
> > > On Fri, Apr 28, 2023 at 4:05 PM Fokko Driesprong <fo...@apache.org>
> > wrote:
> > >
> > > > Hey Gang,
> > > >
> > > > Great bringing this up, I think that would be a great idea!
> > > >
> > > > Kind regards,
> > > > Fokko
> > > >
> > > > Op do 27 apr 2023 om 09:52 schreef Gang Wu <us...@gmail.com>:
> > > >
> > > > > Hi,
> > > > >
> > > > > The latest parquet format is v2.9.0 [1] which was released two
> years
> > ago.
> > > > > Is it a good time to release the next version? If there is no
> > objection,
> > > > I
> > > > > can
> > > > > volunteer to be the release manager.
> > > > >
> > > > > [1]
> https://github.com/apache/parquet-format/blob/master/CHANGES.md
> > > > >
> > > > > Best,
> > > > > Gang
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Time to release parquet format 2.10.0?

Posted by Gang Wu <us...@gmail.com>.
I think we can wait for a complete PoC implementation of PARQUET-2261
before release. BTW, I'd like to see the implementation from Micah to fully
understand the use case. If he is too busy to do that, I can do it based on
my understanding.

Best,
Gang

On Fri, May 12, 2023 at 4:34 PM Gábor Szádovszky <ga...@apache.org> wrote:

> Thanks a lot for volunteering, Gang!
>
> However it is more than 2 years indeed since the last release I think the
> actual changes since then are more important. There are lots of
> additions/corrections in the spec docs and the thrift file comments which
> are very important but not tightly attached to a format release. I only can
> see PARQUET-2257 that contains an actual change in the thrift structure.
>
> Related to the ongoing effort of PARQUET-2261: I think, we are waiting for
> a PoC implementation. @emkornfield: Do you plan to work on this?
>
> The question is if we think PARQUET-2257 is urgent enough to not to wait
> for PARQUET-2261 and have an additional release after the latter is ready
> or we shall wait for the PoC implementation and release format after it.
>
> On 2023/05/02 03:33:05 Gang Wu wrote:
> > Thanks Fokko!
> >
> > Let us just wait for more inputs to see if it is good to proceed.
> >
> > Best,
> > Gang
> >
> > On Fri, Apr 28, 2023 at 4:05 PM Fokko Driesprong <fo...@apache.org>
> wrote:
> >
> > > Hey Gang,
> > >
> > > Great bringing this up, I think that would be a great idea!
> > >
> > > Kind regards,
> > > Fokko
> > >
> > > Op do 27 apr 2023 om 09:52 schreef Gang Wu <us...@gmail.com>:
> > >
> > > > Hi,
> > > >
> > > > The latest parquet format is v2.9.0 [1] which was released two years
> ago.
> > > > Is it a good time to release the next version? If there is no
> objection,
> > > I
> > > > can
> > > > volunteer to be the release manager.
> > > >
> > > > [1] https://github.com/apache/parquet-format/blob/master/CHANGES.md
> > > >
> > > > Best,
> > > > Gang
> > > >
> > >
> >
>

Re: [DISCUSS] Time to release parquet format 2.10.0?

Posted by Gábor Szádovszky <ga...@apache.org>.
Thanks a lot for volunteering, Gang!

However it is more than 2 years indeed since the last release I think the actual changes since then are more important. There are lots of additions/corrections in the spec docs and the thrift file comments which are very important but not tightly attached to a format release. I only can see PARQUET-2257 that contains an actual change in the thrift structure.

Related to the ongoing effort of PARQUET-2261: I think, we are waiting for a PoC implementation. @emkornfield: Do you plan to work on this? 

The question is if we think PARQUET-2257 is urgent enough to not to wait for PARQUET-2261 and have an additional release after the latter is ready or we shall wait for the PoC implementation and release format after it.

On 2023/05/02 03:33:05 Gang Wu wrote:
> Thanks Fokko!
> 
> Let us just wait for more inputs to see if it is good to proceed.
> 
> Best,
> Gang
> 
> On Fri, Apr 28, 2023 at 4:05 PM Fokko Driesprong <fo...@apache.org> wrote:
> 
> > Hey Gang,
> >
> > Great bringing this up, I think that would be a great idea!
> >
> > Kind regards,
> > Fokko
> >
> > Op do 27 apr 2023 om 09:52 schreef Gang Wu <us...@gmail.com>:
> >
> > > Hi,
> > >
> > > The latest parquet format is v2.9.0 [1] which was released two years ago.
> > > Is it a good time to release the next version? If there is no objection,
> > I
> > > can
> > > volunteer to be the release manager.
> > >
> > > [1] https://github.com/apache/parquet-format/blob/master/CHANGES.md
> > >
> > > Best,
> > > Gang
> > >
> >
>