You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Uwe L. Korn" <ma...@uwekorn.com> on 2020/12/04 20:11:12 UTC

Incompatability of all existing pyarrow releases with the next NumPy release

Hello all,

Today the Karotothek CI turned quite red in https://github.com/JDASoftwareGroup/kartothek/pull/383 / https://github.com/JDASoftwareGroup/kartothek/pull/383/checks?check_run_id=1497941813 as the new NumPy 1.20rc1 was pulled in. It simply broke all pyarrow<->NumPy interop as now dtypes returned by numpy are actual subclasses not directly numpy.dtype instances anymore. I reported the issue over at https://github.com/numpy/numpy/issues/17913. We are running into that as we build our wheels and conda packages with an older release of NumPy that has a faulty implementation of PyArray_DescrCheck.

 (a) For upcoming releases, we can either move our minimal supported NumPy to 1.16.6 or merge the PR over at https://github.com/apache/arrow/pull/8834
 (b) Existing conda(-forge) packages can get a repodata patch that adds a numpy<1.20 constraint to them
 (c) I'll rebuild the latest but still frequently used pyarrow releases on conda-forge using numpy 1.16.6
 (d) Old pyarrow wheels (Python<3.8) though won't be easily fixed and instead will return the confusing "ArrowTypeError: Did not pass numpy.dtype object" error message. Personally my approach would be here to not do anything and simply direct users to downgrade NumPy if they run into the issue.

Is anyone objecting to this approach?

Cheers
Uwe

Re: Incompatability of all existing pyarrow releases with the next NumPy release

Posted by Wes McKinney <we...@gmail.com>.
I believe we can do a release that is just focused on the Python
artifacts, yes.

On Mon, Dec 7, 2020 at 6:52 AM Joris Van den Bossche
<jo...@gmail.com> wrote:
>
> On Fri, 4 Dec 2020 at 21:11, Uwe L. Korn <ma...@uwekorn.com> wrote:
>
> > Hello all,
> >
> > Today the Karotothek CI turned quite red in
> > https://github.com/JDASoftwareGroup/kartothek/pull/383 /
> > https://github.com/JDASoftwareGroup/kartothek/pull/383/checks?check_run_id=1497941813
> > as the new NumPy 1.20rc1 was pulled in. It simply broke all pyarrow<->NumPy
> > interop as now dtypes returned by numpy are actual subclasses not directly
> > numpy.dtype instances anymore. I reported the issue over at
> > https://github.com/numpy/numpy/issues/17913. We are running into that as
> > we build our wheels and conda packages with an older release of NumPy that
> > has a faulty implementation of PyArray_DescrCheck.
> >
> >  (a) For upcoming releases, we can either move our minimal supported NumPy
> > to 1.16.6 or merge the PR over at
> > https://github.com/apache/arrow/pull/8834
> >  (b) Existing conda(-forge) packages can get a repodata patch that adds a
> > numpy<1.20 constraint to them
> >  (c) I'll rebuild the latest but still frequently used pyarrow releases on
> > conda-forge using numpy 1.16.6
> >  (d) Old pyarrow wheels (Python<3.8) though won't be easily fixed and
> > instead will return the confusing "ArrowTypeError: Did not pass numpy.dtype
> > object" error message. Personally my approach would be here to not do
> > anything and simply direct users to downgrade NumPy if they run into the
> > issue.
> >
> > In addition to this last item (pip installs), doing a small 2.0.1 bugfix
> release with this patch would also help a lot I think. It would at least
> ensure that plain pip installs with latest versions will work (while it
> doesn't solve it for older pyarrow releases of course, in case people
> upgrade numpy in an existing environment, or install numpy with pyarrow
> pinned to an older version).
>
> Does our project governance allow doing a python-only release? (meaning, a
> release branch where the 2.0.1 tag compared to 2.0.0 tag only includes
> changes to the python libraries) That would make it less burdensome to
> resolve part of this situation.
>
>
> > Is anyone objecting to this approach?
> >
> > Cheers
> > Uwe
> >

Re: Incompatability of all existing pyarrow releases with the next NumPy release

Posted by Joris Van den Bossche <jo...@gmail.com>.
On Fri, 4 Dec 2020 at 21:11, Uwe L. Korn <ma...@uwekorn.com> wrote:

> Hello all,
>
> Today the Karotothek CI turned quite red in
> https://github.com/JDASoftwareGroup/kartothek/pull/383 /
> https://github.com/JDASoftwareGroup/kartothek/pull/383/checks?check_run_id=1497941813
> as the new NumPy 1.20rc1 was pulled in. It simply broke all pyarrow<->NumPy
> interop as now dtypes returned by numpy are actual subclasses not directly
> numpy.dtype instances anymore. I reported the issue over at
> https://github.com/numpy/numpy/issues/17913. We are running into that as
> we build our wheels and conda packages with an older release of NumPy that
> has a faulty implementation of PyArray_DescrCheck.
>
>  (a) For upcoming releases, we can either move our minimal supported NumPy
> to 1.16.6 or merge the PR over at
> https://github.com/apache/arrow/pull/8834
>  (b) Existing conda(-forge) packages can get a repodata patch that adds a
> numpy<1.20 constraint to them
>  (c) I'll rebuild the latest but still frequently used pyarrow releases on
> conda-forge using numpy 1.16.6
>  (d) Old pyarrow wheels (Python<3.8) though won't be easily fixed and
> instead will return the confusing "ArrowTypeError: Did not pass numpy.dtype
> object" error message. Personally my approach would be here to not do
> anything and simply direct users to downgrade NumPy if they run into the
> issue.
>
> In addition to this last item (pip installs), doing a small 2.0.1 bugfix
release with this patch would also help a lot I think. It would at least
ensure that plain pip installs with latest versions will work (while it
doesn't solve it for older pyarrow releases of course, in case people
upgrade numpy in an existing environment, or install numpy with pyarrow
pinned to an older version).

Does our project governance allow doing a python-only release? (meaning, a
release branch where the 2.0.1 tag compared to 2.0.0 tag only includes
changes to the python libraries) That would make it less burdensome to
resolve part of this situation.


> Is anyone objecting to this approach?
>
> Cheers
> Uwe
>

Re: Incompatability of all existing pyarrow releases with the next NumPy release

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
NumPy's deprecation policy would drop support for the 1.16 series in January: https://numpy.org/neps/nep-0029-deprecation_policy.html#support-table This when I would suggest to up the minimal numpy in builds here to 1.17 and we will also up the version used for builds in conda-forge.

Still, the PR is so trival that we should merge it. I'm not uptodate what the status of the 2.0.1 release is but this would be an essential patch for that.

On Fri, Dec 4, 2020, at 9:22 PM, Antoine Pitrou wrote:
> 
> 
> Le 04/12/2020 à 21:11, Uwe L. Korn a écrit :
> > Hello all,
> > 
> > Today the Karotothek CI turned quite red in https://github.com/JDASoftwareGroup/kartothek/pull/383 / https://github.com/JDASoftwareGroup/kartothek/pull/383/checks?check_run_id=1497941813 as the new NumPy 1.20rc1 was pulled in. It simply broke all pyarrow<->NumPy interop as now dtypes returned by numpy are actual subclasses not directly numpy.dtype instances anymore. I reported the issue over at https://github.com/numpy/numpy/issues/17913. We are running into that as we build our wheels and conda packages with an older release of NumPy that has a faulty implementation of PyArray_DescrCheck.
> > 
> >  (a) For upcoming releases, we can either move our minimal supported NumPy to 1.16.6 or merge the PR over at https://github.com/apache/arrow/pull/8834
> 
> I would be fine with merging the PR (assuming comments are added to
> explain why things are done that way).  Apparently Numpy 1.16.6 is only
> one year old.
> 
> Regards
> 
> Antoine.
>

Re: Incompatability of all existing pyarrow releases with the next NumPy release

Posted by Antoine Pitrou <an...@python.org>.

Le 04/12/2020 à 21:11, Uwe L. Korn a écrit :
> Hello all,
> 
> Today the Karotothek CI turned quite red in https://github.com/JDASoftwareGroup/kartothek/pull/383 / https://github.com/JDASoftwareGroup/kartothek/pull/383/checks?check_run_id=1497941813 as the new NumPy 1.20rc1 was pulled in. It simply broke all pyarrow<->NumPy interop as now dtypes returned by numpy are actual subclasses not directly numpy.dtype instances anymore. I reported the issue over at https://github.com/numpy/numpy/issues/17913. We are running into that as we build our wheels and conda packages with an older release of NumPy that has a faulty implementation of PyArray_DescrCheck.
> 
>  (a) For upcoming releases, we can either move our minimal supported NumPy to 1.16.6 or merge the PR over at https://github.com/apache/arrow/pull/8834

I would be fine with merging the PR (assuming comments are added to
explain why things are done that way).  Apparently Numpy 1.16.6 is only
one year old.

Regards

Antoine.