You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Joris Van den Bossche <jo...@gmail.com> on 2020/06/02 08:11:25 UTC

Re: Why downloading sources of pyarrow and its requirements takes several minutes?

I think this is due to numpy starting to have a pyproject.toml file since
1.18 (https://github.com/numpy/numpy/pull/14053)
And apparently, when a package includes a pyproject.toml, pip will create a
build environment, just to get the metadata (and in case of numpy, this
means creating an environment with setuptools, wheel and cython packages
installed). And this is what takes some more time, compared to older
versions of numpy.

On Fri, 29 May 2020 at 20:02, Valentyn Tymofieiev
<va...@google.com.invalid> wrote:

> Thanks for the input. Opened
> https://issues.apache.org/jira/browse/ARROW-8983, we can continue the
> conversation there.
>
> On Thu, May 28, 2020 at 2:46 PM Valentyn Tymofieiev <va...@google.com>
> wrote:
>
> > Hi Arrow dev community,
> >
> > Do you have any insight why
> >
> >           python -m pip download --dest /tmp pyarrow==0.16.0 --no-binary
> > :all:
> >
> > takes several minutes to execute? From the output we can see that pip get
> > stuck on:
> >
> >   File was already downloaded /tmp/pyarrow-0.16.0.tar.gz
> >   Installing build dependencies ... |
> >
> > There is a significant increase in runtime between 0.15.1 and 0.16.0. I
> > suspect  some build dependencies need to be installed before pip
> > understands the dependencies of pyarrow.  Is there some inefficiency in
> > Avro's setup.py that is causing this?
> >
> > Thanks,
> > Valentyn
> >
>