You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2020/05/24 20:21:41 UTC

Re: [C++][Python] Highlighting some known problems with our Arrow C++ and Python packages

Would anyone have some bandwidth in the next couple of months to help
with this?

On Thu, Apr 30, 2020 at 9:10 AM Wes McKinney <we...@gmail.com> wrote:
>
> The proposal is for any BUNDLED dependency to be merged into
> libarrow.a (or another one of the static libraries if the dependency
> is only used in e.g. one subcomponent), so this applies to the AWS SDK
> also
>
> On Thu, Apr 30, 2020 at 3:02 AM Rémi Dettai <rd...@gmail.com> wrote:
> >
> > Hi!
> >
> > Does your point 1 also apply to the AWS SDK dependency ? Currently it seems
> > that it cannot be built in BUNDLED mode. As stated in
> > https://issues.apache.org/jira/browse/ARROW-8565 I struggled a lot to make
> > a static build with the S3 dependency activated ! I would really like to
> > help on this because it is very important for my usecase that we can
> > assemble compact builds of Arrow, but I'm still very uncomfortable with
> > CMake :-(
> >
> > Thanks for your amazing work !
> >
> > Remi
> >
> > Le mar. 28 avr. 2020 à 16:22, Wes McKinney <we...@gmail.com> a écrit :
> >
> > > hi folks,
> > >
> > > I would like to highlight some outstanding problems with our packages
> > >
> > > 1. Our Arrow C++ static libraries are generally unusable.
> > >
> > > Whenever -DARROW_JEMALLOC=ON or any dependency is built in BUNDLED
> > > mode, libarrow.a (or other static libraries) cannot be used for
> > > linking. That's because the static library has a dependency on the
> > > bundled static wheels which are _not_ packaged with the Arrow static
> > > libraries.
> > >
> > > The preferred solution seems to be ARROW-7605. I demonstrated how this
> > > works in
> > >
> > > https://github.com/apache/arrow/pull/6220
> > >
> > > but I need someone to help with the PR to deal with other BUNDLED
> > > dependencies. I likely won't be able to complete the PR myself in time
> > > for the next release.
> > >
> > > 2. Our Python packages are unacceptably large
> > >
> > > On Linux, wheels are now 64MB and after installation take up 218MB.
> > > There is an immediate serious problem that has gone unresolved that is
> > > easier to fix and a separate structural problem that is more difficult
> > > to fix. See the directory listing
> > >
> > > https://gist.github.com/wesm/57bd99798a2fa23ef3cb5e4b18b5a248
> > >
> > > We're duplicating all of the shared libraries inside the wheel and on
> > > disk. It's unfortunate that we've allowed this problem for a whole
> > > year or more
> > >
> > > https://issues.apache.org/jira/browse/ARROW-5082
> > >
> > > I also recently opened
> > >
> > > https://issues.apache.org/jira/browse/ARROW-8518
> > >
> > > which describes a proposal to create some tools to assist with
> > > building "parent" and "child" Python packages. This would enable us to
> > > ship components like Flight and Gandiva as separate wheels. This is a
> > > large project but one that will ultimately be necessary for the
> > > long-term scalability and sustainability of the project.
> > >
> > > I am not able to personally work on either of these projects in the
> > > current release cycle, but I hope that some progress can be made on
> > > these since they have lingered on for a long time, and it would be
> > > good for us to "put our best foot forward" with the 1.0.0 release.
> > >
> > > Thanks,
> > > Wes
> > >