You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Krisztián Szűcs <sz...@gmail.com> on 2019/07/11 15:52:59 UTC

[Python] Wheel questions

Hi All,

I have a couple of questions about the wheel packaging:
- why do we build an arrow namespaced boost on linux and osx, could we link
statically like with the windows wheels?
- do we explicitly say somewhere in the linux wheels to link the 3rdparty
dependencies statically or just implicitly, by removing (or not building)
the shared libs for the 3rdparty dependencies?
- couldn't we use the 3rdparty toolchain to build the smaller 3rdparty
dependencies for the linux wheels instead of building them manually in the
manylinux docker image - it'd easier to say <dependency>_SOURCE=BUNDLED

Regards, Krisztian

Re: [Python] Wheel questions

Posted by Antoine Pitrou <an...@python.org>.
Le 12/07/2019 à 11:39, Uwe L. Korn a écrit :
> Actually the most pragmatic way I have thought of yet would be to use conda and build all our dependencies. Instead of using the compilers defaults and conda-forge use, we should build the dependencies in the manylinuxXXXX image and then upload them to a custom channel. This should also make the maintenance of the arrow-manylinx docker container easy as this won't require you then to do a full recompile of LLVM just because you changed something in a preceeding step.

That sounds cumbersome though.  Each upgrade or modification in the
building of those libraries needs changing and updating some conda
packages somewhere...  So we would be trading one inconvenience against
another.

Note I recently moved llvm and clang compilation up in the Dockerfile,
so most changes can now be done without recompiling them.

Regards

Antoine.

Re: [Python] Wheel questions

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Hallo,

On Thu, Jul 11, 2019, at 9:51 PM, Wes McKinney wrote:
> On Thu, Jul 11, 2019 at 11:26 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Le 11/07/2019 à 17:52, Krisztián Szűcs a écrit :
> > > Hi All,
> > >
> > > I have a couple of questions about the wheel packaging:
> > > - why do we build an arrow namespaced boost on linux and osx, could we link
> > > statically like with the windows wheels?
> >
> > No idea.  Boost shouldn't leak in the public APIs, so theoretically a
> > static build would be fine...

Static linkage is fine as long as we don't expose any Boost symbols. We had that historically in the Decimals. If this is gone, we can switch static linkage.

> > > - do we explicitly say somewhere in the linux wheels to link the 3rdparty
> > > dependencies statically or just implicitly, by removing (or not building)
> > > the shared libs for the 3rdparty dependencies?
> >
> > It's implicit by removing the shared libs (or not building them).
> > Some time ago the compression libs were always linked statically by
> > default, but it was changed to dynamic along the time, probably to
> > please system packagers.
> 
> I think only libz shared library is being bundled, for security reasons

Ah, yes. This was why we made the dynamic linkage! Can you add a comment the next time you touch the build scripts?

> > > - couldn't we use the 3rdparty toolchain to build the smaller 3rdparty
> > > dependencies for the linux wheels instead of building them manually in the
> > > manylinux docker image - it'd easier to say <dependency>_SOURCE=BUNDLED
> >
> > I don't think so.  The conda-forge and Anaconda packages use a different
> > build chain (different compiler, different libstdc++ version) and may
> > not be usable directly on manylinux-compliant systems.
> 
> I think you may misunderstand. Krisztian is suggesting building the
> dependencies through the ExternalProject mechanism during "docker run"
> on the image rather than caching pre-built versions in the Docker
> image.
> 
> For small dependencies, I don't see why we couldn't used the BUNDLED
> approach. This might spare us having to maintain some of the build
> scripts. It will strictly increase build times, though -- I think the
> reason that everything is cached now is to save on build times (which
> have historically been quite long)

Actually the most pragmatic way I have thought of yet would be to use conda and build all our dependencies. Instead of using the compilers defaults and conda-forge use, we should build the dependencies in the manylinuxXXXX image and then upload them to a custom channel. This should also make the maintenance of the arrow-manylinx docker container easy as this won't require you then to do a full recompile of LLVM just because you changed something in a preceeding step.

Uwe

Re: [Python] Wheel questions

Posted by Wes McKinney <we...@gmail.com>.
On Thu, Jul 11, 2019 at 11:26 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Le 11/07/2019 à 17:52, Krisztián Szűcs a écrit :
> > Hi All,
> >
> > I have a couple of questions about the wheel packaging:
> > - why do we build an arrow namespaced boost on linux and osx, could we link
> > statically like with the windows wheels?
>
> No idea.  Boost shouldn't leak in the public APIs, so theoretically a
> static build would be fine...

In principle the privately-namespaced Boost could be statically
linked. We are using bcp to change the C++ namespace of the symbols so
that our Boost symbols don't conflict with other wheels' Boost symbols
(which may have come from a different Boost version).

I'll let Uwe comment further on the desire for dynamic linking

>
> > - do we explicitly say somewhere in the linux wheels to link the 3rdparty
> > dependencies statically or just implicitly, by removing (or not building)
> > the shared libs for the 3rdparty dependencies?
>
> It's implicit by removing the shared libs (or not building them).
> Some time ago the compression libs were always linked statically by
> default, but it was changed to dynamic along the time, probably to
> please system packagers.

I think only libz shared library is being bundled, for security reasons

>
> > - couldn't we use the 3rdparty toolchain to build the smaller 3rdparty
> > dependencies for the linux wheels instead of building them manually in the
> > manylinux docker image - it'd easier to say <dependency>_SOURCE=BUNDLED
>
> I don't think so.  The conda-forge and Anaconda packages use a different
> build chain (different compiler, different libstdc++ version) and may
> not be usable directly on manylinux-compliant systems.

I think you may misunderstand. Krisztian is suggesting building the
dependencies through the ExternalProject mechanism during "docker run"
on the image rather than caching pre-built versions in the Docker
image.

For small dependencies, I don't see why we couldn't used the BUNDLED
approach. This might spare us having to maintain some of the build
scripts. It will strictly increase build times, though -- I think the
reason that everything is cached now is to save on build times (which
have historically been quite long)

>
> Regards
>
> Antoine.

Re: [Python] Wheel questions

Posted by Antoine Pitrou <an...@python.org>.
Le 11/07/2019 à 17:52, Krisztián Szűcs a écrit :
> Hi All,
> 
> I have a couple of questions about the wheel packaging:
> - why do we build an arrow namespaced boost on linux and osx, could we link
> statically like with the windows wheels?

No idea.  Boost shouldn't leak in the public APIs, so theoretically a
static build would be fine...

> - do we explicitly say somewhere in the linux wheels to link the 3rdparty
> dependencies statically or just implicitly, by removing (or not building)
> the shared libs for the 3rdparty dependencies?

It's implicit by removing the shared libs (or not building them).
Some time ago the compression libs were always linked statically by
default, but it was changed to dynamic along the time, probably to
please system packagers.

> - couldn't we use the 3rdparty toolchain to build the smaller 3rdparty
> dependencies for the linux wheels instead of building them manually in the
> manylinux docker image - it'd easier to say <dependency>_SOURCE=BUNDLED

I don't think so.  The conda-forge and Anaconda packages use a different
build chain (different compiler, different libstdc++ version) and may
not be usable directly on manylinux-compliant systems.

Regards

Antoine.