You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2018/09/04 22:23:22 UTC

[DISCUSS] Dropping support for CentOS 5 / RHEL5 in Python packages

hi folks,

Surfacing a JIRA discussion ([4]) to the mailing list for discussion.

The manylinux1 ABI was developed to provide a mechanism for portable
Python packages with pre-compiled binary extensions supporting C and
C++, including C++11, on a wide variety of Linux distributions without
need for distribution-specific packages. This is accomplished using
RedHat's devtoolset-2, which performs selecting static linking of
symbols from libstdc++ that cause ABI conflicts when used on systems
with older standard libraries.

The base image for producing these binaries is specified in a Dockerfile [1].

The problem that we are having is that some C++ libraries, notably
Google's Abseil C++ library, require a version of glibc that is too
new for RHEL5. By building with CentOS6 / RHEL6 as the base image, we
would get a new enough glibc (version 2.12). But building against
glibc 2.12 would leave behind the RHEL5 folks.

There is the in-discussion manylinux2010 standard uses RHEL6 as a base
standard, but it is not yet finalized or in production.

Some modern C++ projects shipping to Python have already left behind
the manylinux1 standard even though their Python binaries claim to
implement the standard. Both PyTorch and TensorFlow are tagged as
manylinux1 although they have a different ABI. See [2] for example and
[3]

In my view there are two paths forward, neither perfect:

1) Stick with the manylinux1 ABI and do not use thirdparty libraries
requiring newer glibc
2) "Cheat" on manylinux1 by using centos6 instead of centos5 as the
base image for the wheel builds. This is what PyTorch is doing

Since centos5 / RHEL5 are already past EOL those would be the primary
casualties, but I'm not sure how many users would be affected. My
guess is that they represent a small minority of our users at this
point. RedHat is offering extended support for RHEL5 through end of
2020 but those are probably fairly exceptional cases and unlikely
(IMHO) to be working on the bleeding edge of Python data engineering.

Personally I would like to go with Option 2 and hope that this
particular Python packaging gets sorted out in the next 12-24 months
as we've already suffered problems due to TensorFlow and PyTorch's
non-conformity with the manylinux1 ABI.

Interested in the opinions of others.

- Wes

[1]: https://github.com/pypa/manylinux/blob/master/docker/Dockerfile-x86_64
[2]: https://github.com/NVIDIA/nvidia-docker/issues/348#issuecomment-288875848
[3]: https://github.com/pypa/manylinux/issues/96
[4]: https://issues.apache.org/jira/browse/ARROW-2461

Re: [DISCUSS] Dropping support for CentOS 5 / RHEL5 in Python packages

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Hello Wes,

I'm ok with option 2 when we use the yet unfinished manylinux2010 image as the base. This way, we will still be able to produce wheels that in the near future are actually based an a architecture tag supported by a PEP. Also as I have some packaging nightmare, I would feel much better when we first are able to get a release out that features parquet-cpp merged into the main Arrow tree before we switch the manylinux* base image.

Uwe

On Wed, Sep 5, 2018, at 1:22 AM, Ted Dunning wrote:
> Just as a point of reference, I don't think that get any pushback at MapR
> for not supporting RHEL 5 and that has been our policy for a few years now.
> 
> That experience should be pretty similar for Arrow, except that I would
> expect that new adoptions might be even more canted towards current
> versions.
> 
> 
> 
> 
> On Tue, Sep 4, 2018 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:
> 
> > hi folks,
> >
> > Surfacing a JIRA discussion ([4]) to the mailing list for discussion.
> >
> > The manylinux1 ABI was developed to provide a mechanism for portable
> > Python packages with pre-compiled binary extensions supporting C and
> > C++, including C++11, on a wide variety of Linux distributions without
> > need for distribution-specific packages. This is accomplished using
> > RedHat's devtoolset-2, which performs selecting static linking of
> > symbols from libstdc++ that cause ABI conflicts when used on systems
> > with older standard libraries.
> >
> > The base image for producing these binaries is specified in a Dockerfile
> > [1].
> >
> > The problem that we are having is that some C++ libraries, notably
> > Google's Abseil C++ library, require a version of glibc that is too
> > new for RHEL5. By building with CentOS6 / RHEL6 as the base image, we
> > would get a new enough glibc (version 2.12). But building against
> > glibc 2.12 would leave behind the RHEL5 folks.
> >
> > There is the in-discussion manylinux2010 standard uses RHEL6 as a base
> > standard, but it is not yet finalized or in production.
> >
> > Some modern C++ projects shipping to Python have already left behind
> > the manylinux1 standard even though their Python binaries claim to
> > implement the standard. Both PyTorch and TensorFlow are tagged as
> > manylinux1 although they have a different ABI. See [2] for example and
> > [3]
> >
> > In my view there are two paths forward, neither perfect:
> >
> > 1) Stick with the manylinux1 ABI and do not use thirdparty libraries
> > requiring newer glibc
> > 2) "Cheat" on manylinux1 by using centos6 instead of centos5 as the
> > base image for the wheel builds. This is what PyTorch is doing
> >
> > Since centos5 / RHEL5 are already past EOL those would be the primary
> > casualties, but I'm not sure how many users would be affected. My
> > guess is that they represent a small minority of our users at this
> > point. RedHat is offering extended support for RHEL5 through end of
> > 2020 but those are probably fairly exceptional cases and unlikely
> > (IMHO) to be working on the bleeding edge of Python data engineering.
> >
> > Personally I would like to go with Option 2 and hope that this
> > particular Python packaging gets sorted out in the next 12-24 months
> > as we've already suffered problems due to TensorFlow and PyTorch's
> > non-conformity with the manylinux1 ABI.
> >
> > Interested in the opinions of others.
> >
> > - Wes
> >
> > [1]:
> > https://github.com/pypa/manylinux/blob/master/docker/Dockerfile-x86_64
> > [2]:
> > https://github.com/NVIDIA/nvidia-docker/issues/348#issuecomment-288875848
> > [3]: https://github.com/pypa/manylinux/issues/96
> > [4]: https://issues.apache.org/jira/browse/ARROW-2461
> >

Re: [DISCUSS] Dropping support for CentOS 5 / RHEL5 in Python packages

Posted by Ted Dunning <te...@gmail.com>.
Just as a point of reference, I don't think that get any pushback at MapR
for not supporting RHEL 5 and that has been our policy for a few years now.

That experience should be pretty similar for Arrow, except that I would
expect that new adoptions might be even more canted towards current
versions.




On Tue, Sep 4, 2018 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> Surfacing a JIRA discussion ([4]) to the mailing list for discussion.
>
> The manylinux1 ABI was developed to provide a mechanism for portable
> Python packages with pre-compiled binary extensions supporting C and
> C++, including C++11, on a wide variety of Linux distributions without
> need for distribution-specific packages. This is accomplished using
> RedHat's devtoolset-2, which performs selecting static linking of
> symbols from libstdc++ that cause ABI conflicts when used on systems
> with older standard libraries.
>
> The base image for producing these binaries is specified in a Dockerfile
> [1].
>
> The problem that we are having is that some C++ libraries, notably
> Google's Abseil C++ library, require a version of glibc that is too
> new for RHEL5. By building with CentOS6 / RHEL6 as the base image, we
> would get a new enough glibc (version 2.12). But building against
> glibc 2.12 would leave behind the RHEL5 folks.
>
> There is the in-discussion manylinux2010 standard uses RHEL6 as a base
> standard, but it is not yet finalized or in production.
>
> Some modern C++ projects shipping to Python have already left behind
> the manylinux1 standard even though their Python binaries claim to
> implement the standard. Both PyTorch and TensorFlow are tagged as
> manylinux1 although they have a different ABI. See [2] for example and
> [3]
>
> In my view there are two paths forward, neither perfect:
>
> 1) Stick with the manylinux1 ABI and do not use thirdparty libraries
> requiring newer glibc
> 2) "Cheat" on manylinux1 by using centos6 instead of centos5 as the
> base image for the wheel builds. This is what PyTorch is doing
>
> Since centos5 / RHEL5 are already past EOL those would be the primary
> casualties, but I'm not sure how many users would be affected. My
> guess is that they represent a small minority of our users at this
> point. RedHat is offering extended support for RHEL5 through end of
> 2020 but those are probably fairly exceptional cases and unlikely
> (IMHO) to be working on the bleeding edge of Python data engineering.
>
> Personally I would like to go with Option 2 and hope that this
> particular Python packaging gets sorted out in the next 12-24 months
> as we've already suffered problems due to TensorFlow and PyTorch's
> non-conformity with the manylinux1 ABI.
>
> Interested in the opinions of others.
>
> - Wes
>
> [1]:
> https://github.com/pypa/manylinux/blob/master/docker/Dockerfile-x86_64
> [2]:
> https://github.com/NVIDIA/nvidia-docker/issues/348#issuecomment-288875848
> [3]: https://github.com/pypa/manylinux/issues/96
> [4]: https://issues.apache.org/jira/browse/ARROW-2461
>