You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Kiran Padmanabhui <ki...@plastiq.com> on 2019/10/22 21:56:49 UTC

PyArrow Size in 0.15 version

Hi,

I am trying to use PyArrow in AWS Lambda but the size is 190 MB which is
too big for AWS Lambda.

Is there a way I can compile it and reduce it to less than 20-30 MB?

Thanks
Kiran

Re: PyArrow Size in 0.15 version

Posted by Wes McKinney <we...@gmail.com>.
Someone might try writing a "slim" version of

https://github.com/apache/arrow/blob/master/python/manylinux1/build_arrow.sh

That would be the place to start

On Tue, Oct 22, 2019 at 5:17 PM Tim Paine <t....@gmail.com> wrote:
>
> Arrow has lots of configuration arguments, and PyArrow allows you to build certain subsets of Arrow’s functionality. Depending on what you need, you can probably start out by building from source and turning off Parquet, Plasma, and Gandiva support.
>
> When running cmake, use -DARROW_PARQUET=OFF -DARROW_PLASMA=OFF and -DARROW_GANDIVA=OFF.
>
> You can also try trimming the tests, and making sure not to include extra shared libraries.
>
> > On Oct 22, 2019, at 5:56 PM, Kiran Padmanabhui <ki...@plastiq.com> wrote:
> >
> > Hi,
> >
> > I am trying to use PyArrow in AWS Lambda but the size is 190 MB which is
> > too big for AWS Lambda.
> >
> > Is there a way I can compile it and reduce it to less than 20-30 MB?
> >
> > Thanks
> > Kiran
>

Re: PyArrow Size in 0.15 version

Posted by Tim Paine <t....@gmail.com>.
Arrow has lots of configuration arguments, and PyArrow allows you to build certain subsets of Arrow’s functionality. Depending on what you need, you can probably start out by building from source and turning off Parquet, Plasma, and Gandiva support. 

When running cmake, use -DARROW_PARQUET=OFF -DARROW_PLASMA=OFF and -DARROW_GANDIVA=OFF.

You can also try trimming the tests, and making sure not to include extra shared libraries. 

> On Oct 22, 2019, at 5:56 PM, Kiran Padmanabhui <ki...@plastiq.com> wrote:
> 
> Hi,
> 
> I am trying to use PyArrow in AWS Lambda but the size is 190 MB which is
> too big for AWS Lambda.
> 
> Is there a way I can compile it and reduce it to less than 20-30 MB?
> 
> Thanks
> Kiran