You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Weston Pace <we...@gmail.com> on 2023/05/18 17:04:12 UTC

[DISCUSS] Interest in a 12.0.1 patch?

Regrettabl, 12.0.0 had a significant performance regression (I'll take the
blame for not thinking through all the use cases), most easily exposed when
writing datasets from pandas / numpy data, which is being addressed in
[1].  I believe this to be a fairly common use case and it may warrant a
12.0.1 patch.  Are there other issues that would need a patch?  Do we feel
this issue is significant enough to justify the work?

[1] https://github.com/apache/arrow/pull/35565

Re: [DISCUSS] Interest in a 12.0.1 patch?

Posted by Raúl Cumplido <ra...@gmail.com>.
I was planning to start working on the release as soon as the issue
was closed. I've seen there's another issue related to security which
is opened too:
https://github.com/apache/arrow/milestone/54

We probably should also include that one.

El vie, 26 may 2023 a las 17:05, Neal Richardson
(<ne...@gmail.com>) escribió:
>
> Hi all, checking back in about the patch release. Do we have a timeline for
> when we plan to do it? Looks like Weston's PR is about ready to go, and I
> believe that was the last outstanding issue.
>
> Neal
>
> On Fri, May 19, 2023 at 5:30 AM Sutou Kouhei <ko...@clear-code.com> wrote:
>
> > Sure!
> >
> > In <CA...@mail.gmail.com>
> >   "Re: [DISCUSS] Interest in a 12.0.1 patch?" on Fri, 19 May 2023 11:01:52
> > +0200,
> >   Raúl Cumplido <ra...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Based on some conversations, I am +1 on creating the 12.0.1 release. I
> > > can work as release manager for the release, I'll need a PMC to sign
> > > and upload packages as usual. Kou, will you be able to help me with
> > > that?
> > >
> > > Thanks,
> > > Raúl
> > >
> > > El jue, 18 may 2023 a las 19:52, Will Jones
> > > (<wi...@gmail.com>) escribió:
> > >>
> > >> Thanks for bringing this up Weston.
> > >>
> > >> Joris has already created a 12.0.1 milestone that contains several fixes
> > >> that are candidates for backport [1], including this one. I think this
> > is
> > >> the most severe issue though.
> > >>
> > >> As a maintainer of the Python deltalake package, which uses the PyArrow
> > >> Parquet writer and is often passed pandas data, I would appreciate a
> > patch
> > >> release.
> > >>
> > >> Best,
> > >>
> > >> Will Jones
> > >>
> > >> [1]
> > >>
> > https://github.com/apache/arrow/issues?q=is%3Aopen+is%3Aissue+milestone%3A12.0.1
> > >>
> > >> On Thu, May 18, 2023 at 10:18 AM Ian Cook <ia...@ursacomputing.com>
> > wrote:
> > >>
> > >> > There is also a major issue with the 12.0.0 R package that has now
> > >> > been fixed in the repo [2] and needs to be resubmitted to CRAN soon.
> > >> > The R package developers are supportive of a 12.0.1 patch release
> > >> > happening soon so that the resubmission of the R package to CRAN can
> > >> > also include the fix for the performance regression you mention.
> > >> >
> > >> > Ian
> > >> >
> > >> > [2] https://github.com/apache/arrow/pull/35612
> > >> >
> > >> > On Thu, May 18, 2023 at 1:04 PM Weston Pace <we...@gmail.com>
> > wrote:
> > >> > >
> > >> > > Regrettabl, 12.0.0 had a significant performance regression (I'll
> > take
> > >> > the
> > >> > > blame for not thinking through all the use cases), most easily
> > exposed
> > >> > when
> > >> > > writing datasets from pandas / numpy data, which is being addressed
> > in
> > >> > > [1].  I believe this to be a fairly common use case and it may
> > warrant a
> > >> > > 12.0.1 patch.  Are there other issues that would need a patch?  Do
> > we
> > >> > feel
> > >> > > this issue is significant enough to justify the work?
> > >> > >
> > >> > > [1] https://github.com/apache/arrow/pull/35565
> > >> >
> >

Re: [DISCUSS] Interest in a 12.0.1 patch?

Posted by Neal Richardson <ne...@gmail.com>.
Hi all, checking back in about the patch release. Do we have a timeline for
when we plan to do it? Looks like Weston's PR is about ready to go, and I
believe that was the last outstanding issue.

Neal

On Fri, May 19, 2023 at 5:30 AM Sutou Kouhei <ko...@clear-code.com> wrote:

> Sure!
>
> In <CA...@mail.gmail.com>
>   "Re: [DISCUSS] Interest in a 12.0.1 patch?" on Fri, 19 May 2023 11:01:52
> +0200,
>   Raúl Cumplido <ra...@gmail.com> wrote:
>
> > Hi,
> >
> > Based on some conversations, I am +1 on creating the 12.0.1 release. I
> > can work as release manager for the release, I'll need a PMC to sign
> > and upload packages as usual. Kou, will you be able to help me with
> > that?
> >
> > Thanks,
> > Raúl
> >
> > El jue, 18 may 2023 a las 19:52, Will Jones
> > (<wi...@gmail.com>) escribió:
> >>
> >> Thanks for bringing this up Weston.
> >>
> >> Joris has already created a 12.0.1 milestone that contains several fixes
> >> that are candidates for backport [1], including this one. I think this
> is
> >> the most severe issue though.
> >>
> >> As a maintainer of the Python deltalake package, which uses the PyArrow
> >> Parquet writer and is often passed pandas data, I would appreciate a
> patch
> >> release.
> >>
> >> Best,
> >>
> >> Will Jones
> >>
> >> [1]
> >>
> https://github.com/apache/arrow/issues?q=is%3Aopen+is%3Aissue+milestone%3A12.0.1
> >>
> >> On Thu, May 18, 2023 at 10:18 AM Ian Cook <ia...@ursacomputing.com>
> wrote:
> >>
> >> > There is also a major issue with the 12.0.0 R package that has now
> >> > been fixed in the repo [2] and needs to be resubmitted to CRAN soon.
> >> > The R package developers are supportive of a 12.0.1 patch release
> >> > happening soon so that the resubmission of the R package to CRAN can
> >> > also include the fix for the performance regression you mention.
> >> >
> >> > Ian
> >> >
> >> > [2] https://github.com/apache/arrow/pull/35612
> >> >
> >> > On Thu, May 18, 2023 at 1:04 PM Weston Pace <we...@gmail.com>
> wrote:
> >> > >
> >> > > Regrettabl, 12.0.0 had a significant performance regression (I'll
> take
> >> > the
> >> > > blame for not thinking through all the use cases), most easily
> exposed
> >> > when
> >> > > writing datasets from pandas / numpy data, which is being addressed
> in
> >> > > [1].  I believe this to be a fairly common use case and it may
> warrant a
> >> > > 12.0.1 patch.  Are there other issues that would need a patch?  Do
> we
> >> > feel
> >> > > this issue is significant enough to justify the work?
> >> > >
> >> > > [1] https://github.com/apache/arrow/pull/35565
> >> >
>

Re: [DISCUSS] Interest in a 12.0.1 patch?

Posted by Sutou Kouhei <ko...@clear-code.com>.
Sure!

In <CA...@mail.gmail.com>
  "Re: [DISCUSS] Interest in a 12.0.1 patch?" on Fri, 19 May 2023 11:01:52 +0200,
  Raúl Cumplido <ra...@gmail.com> wrote:

> Hi,
> 
> Based on some conversations, I am +1 on creating the 12.0.1 release. I
> can work as release manager for the release, I'll need a PMC to sign
> and upload packages as usual. Kou, will you be able to help me with
> that?
> 
> Thanks,
> Raúl
> 
> El jue, 18 may 2023 a las 19:52, Will Jones
> (<wi...@gmail.com>) escribió:
>>
>> Thanks for bringing this up Weston.
>>
>> Joris has already created a 12.0.1 milestone that contains several fixes
>> that are candidates for backport [1], including this one. I think this is
>> the most severe issue though.
>>
>> As a maintainer of the Python deltalake package, which uses the PyArrow
>> Parquet writer and is often passed pandas data, I would appreciate a patch
>> release.
>>
>> Best,
>>
>> Will Jones
>>
>> [1]
>> https://github.com/apache/arrow/issues?q=is%3Aopen+is%3Aissue+milestone%3A12.0.1
>>
>> On Thu, May 18, 2023 at 10:18 AM Ian Cook <ia...@ursacomputing.com> wrote:
>>
>> > There is also a major issue with the 12.0.0 R package that has now
>> > been fixed in the repo [2] and needs to be resubmitted to CRAN soon.
>> > The R package developers are supportive of a 12.0.1 patch release
>> > happening soon so that the resubmission of the R package to CRAN can
>> > also include the fix for the performance regression you mention.
>> >
>> > Ian
>> >
>> > [2] https://github.com/apache/arrow/pull/35612
>> >
>> > On Thu, May 18, 2023 at 1:04 PM Weston Pace <we...@gmail.com> wrote:
>> > >
>> > > Regrettabl, 12.0.0 had a significant performance regression (I'll take
>> > the
>> > > blame for not thinking through all the use cases), most easily exposed
>> > when
>> > > writing datasets from pandas / numpy data, which is being addressed in
>> > > [1].  I believe this to be a fairly common use case and it may warrant a
>> > > 12.0.1 patch.  Are there other issues that would need a patch?  Do we
>> > feel
>> > > this issue is significant enough to justify the work?
>> > >
>> > > [1] https://github.com/apache/arrow/pull/35565
>> >

Re: [DISCUSS] Interest in a 12.0.1 patch?

Posted by Raúl Cumplido <ra...@gmail.com>.
Hi,

Based on some conversations, I am +1 on creating the 12.0.1 release. I
can work as release manager for the release, I'll need a PMC to sign
and upload packages as usual. Kou, will you be able to help me with
that?

Thanks,
Raúl

El jue, 18 may 2023 a las 19:52, Will Jones
(<wi...@gmail.com>) escribió:
>
> Thanks for bringing this up Weston.
>
> Joris has already created a 12.0.1 milestone that contains several fixes
> that are candidates for backport [1], including this one. I think this is
> the most severe issue though.
>
> As a maintainer of the Python deltalake package, which uses the PyArrow
> Parquet writer and is often passed pandas data, I would appreciate a patch
> release.
>
> Best,
>
> Will Jones
>
> [1]
> https://github.com/apache/arrow/issues?q=is%3Aopen+is%3Aissue+milestone%3A12.0.1
>
> On Thu, May 18, 2023 at 10:18 AM Ian Cook <ia...@ursacomputing.com> wrote:
>
> > There is also a major issue with the 12.0.0 R package that has now
> > been fixed in the repo [2] and needs to be resubmitted to CRAN soon.
> > The R package developers are supportive of a 12.0.1 patch release
> > happening soon so that the resubmission of the R package to CRAN can
> > also include the fix for the performance regression you mention.
> >
> > Ian
> >
> > [2] https://github.com/apache/arrow/pull/35612
> >
> > On Thu, May 18, 2023 at 1:04 PM Weston Pace <we...@gmail.com> wrote:
> > >
> > > Regrettabl, 12.0.0 had a significant performance regression (I'll take
> > the
> > > blame for not thinking through all the use cases), most easily exposed
> > when
> > > writing datasets from pandas / numpy data, which is being addressed in
> > > [1].  I believe this to be a fairly common use case and it may warrant a
> > > 12.0.1 patch.  Are there other issues that would need a patch?  Do we
> > feel
> > > this issue is significant enough to justify the work?
> > >
> > > [1] https://github.com/apache/arrow/pull/35565
> >

Re: [DISCUSS] Interest in a 12.0.1 patch?

Posted by Will Jones <wi...@gmail.com>.
Thanks for bringing this up Weston.

Joris has already created a 12.0.1 milestone that contains several fixes
that are candidates for backport [1], including this one. I think this is
the most severe issue though.

As a maintainer of the Python deltalake package, which uses the PyArrow
Parquet writer and is often passed pandas data, I would appreciate a patch
release.

Best,

Will Jones

[1]
https://github.com/apache/arrow/issues?q=is%3Aopen+is%3Aissue+milestone%3A12.0.1

On Thu, May 18, 2023 at 10:18 AM Ian Cook <ia...@ursacomputing.com> wrote:

> There is also a major issue with the 12.0.0 R package that has now
> been fixed in the repo [2] and needs to be resubmitted to CRAN soon.
> The R package developers are supportive of a 12.0.1 patch release
> happening soon so that the resubmission of the R package to CRAN can
> also include the fix for the performance regression you mention.
>
> Ian
>
> [2] https://github.com/apache/arrow/pull/35612
>
> On Thu, May 18, 2023 at 1:04 PM Weston Pace <we...@gmail.com> wrote:
> >
> > Regrettabl, 12.0.0 had a significant performance regression (I'll take
> the
> > blame for not thinking through all the use cases), most easily exposed
> when
> > writing datasets from pandas / numpy data, which is being addressed in
> > [1].  I believe this to be a fairly common use case and it may warrant a
> > 12.0.1 patch.  Are there other issues that would need a patch?  Do we
> feel
> > this issue is significant enough to justify the work?
> >
> > [1] https://github.com/apache/arrow/pull/35565
>

Re: [DISCUSS] Interest in a 12.0.1 patch?

Posted by Matt Topol <zo...@gmail.com>.
I think it's worthwhile enough to justify the work for the patch. If we do
end up doing the patch, then we should also include this [1] change for the
Go side which, while significant, I didn't believe to be significant enough
to warrant a patch on its own. But it is definitely a good idea to include
this in a patch release if we're going to be doing one for other reasons.

--Matt

[1]: https://github.com/apache/arrow/issues/35337

On Thu, May 18, 2023 at 1:18 PM Ian Cook <ia...@ursacomputing.com> wrote:

> There is also a major issue with the 12.0.0 R package that has now
> been fixed in the repo [2] and needs to be resubmitted to CRAN soon.
> The R package developers are supportive of a 12.0.1 patch release
> happening soon so that the resubmission of the R package to CRAN can
> also include the fix for the performance regression you mention.
>
> Ian
>
> [2] https://github.com/apache/arrow/pull/35612
>
> On Thu, May 18, 2023 at 1:04 PM Weston Pace <we...@gmail.com> wrote:
> >
> > Regrettabl, 12.0.0 had a significant performance regression (I'll take
> the
> > blame for not thinking through all the use cases), most easily exposed
> when
> > writing datasets from pandas / numpy data, which is being addressed in
> > [1].  I believe this to be a fairly common use case and it may warrant a
> > 12.0.1 patch.  Are there other issues that would need a patch?  Do we
> feel
> > this issue is significant enough to justify the work?
> >
> > [1] https://github.com/apache/arrow/pull/35565
>

Re: [DISCUSS] Interest in a 12.0.1 patch?

Posted by Ian Cook <ia...@ursacomputing.com>.
There is also a major issue with the 12.0.0 R package that has now
been fixed in the repo [2] and needs to be resubmitted to CRAN soon.
The R package developers are supportive of a 12.0.1 patch release
happening soon so that the resubmission of the R package to CRAN can
also include the fix for the performance regression you mention.

Ian

[2] https://github.com/apache/arrow/pull/35612

On Thu, May 18, 2023 at 1:04 PM Weston Pace <we...@gmail.com> wrote:
>
> Regrettabl, 12.0.0 had a significant performance regression (I'll take the
> blame for not thinking through all the use cases), most easily exposed when
> writing datasets from pandas / numpy data, which is being addressed in
> [1].  I believe this to be a fairly common use case and it may warrant a
> 12.0.1 patch.  Are there other issues that would need a patch?  Do we feel
> this issue is significant enough to justify the work?
>
> [1] https://github.com/apache/arrow/pull/35565