You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Avishek Saha <av...@gmail.com> on 2014/06/27 02:45:00 UTC

numpy + pyspark

Hi all,

Instead of installing numpy in each worker node, is it possible to
ship numpy (via --py-files option maybe) while invoking the
spark-submit?

Thanks,
Avishek

Re: numpy + pyspark

Posted by Avishek Saha <av...@gmail.com>.
Thanks for the great links guys -- let me check out/try both !!


On 27 June 2014 08:15, Shannon Quinn <sq...@gatech.edu> wrote:

>  I suppose along those lines, there's also Anaconda:
> https://store.continuum.io/cshop/anaconda/
>
>
> On 6/27/14, 11:13 AM, Nick Pentreath wrote:
>
> Hadoopy uses http://www.pyinstaller.org/ to package things up into an
> executable that should be runnable without root privileges. It says it
> support numpy
>
>
> On Fri, Jun 27, 2014 at 5:08 PM, Shannon Quinn <sq...@gatech.edu> wrote:
>
>>  Would deploying virtualenv on each directory on the cluster be viable?
>> The dependencies would get tricky but I think this is the sort of situation
>> it's built for.
>>
>>
>> On 6/27/14, 11:06 AM, Avishek Saha wrote:
>>
>> I too felt the same Nick but I don't have root privileges on the cluster,
>> unfortunately. Are there any alternatives?
>>
>>
>> On 27 June 2014 08:04, Nick Pentreath <ni...@gmail.com> wrote:
>>
>>> I've not tried this - but numpy is a tricky and complex package with
>>> many dependencies on Fortran/C libraries etc. I'd say by the time you
>>> figure out correctly deploying numpy in this manner, you may as well have
>>> just built it into your cluster bootstrap process, or PSSH install it on
>>> each node...
>>>
>>>
>>> On Fri, Jun 27, 2014 at 4:58 PM, Avishek Saha <av...@gmail.com>
>>> wrote:
>>>
>>>> To clarify I tried it and it almost worked -- but I am getting some
>>>> problems from the Random module in numpy. If anyone has successfully passed
>>>> a numpy module (via the --py-files option) to spark-submit then please let
>>>> me know.
>>>>
>>>>  Thanks !!
>>>> Avishek
>>>>
>>>>
>>>> On 26 June 2014 17:45, Avishek Saha <av...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Instead of installing numpy in each worker node, is it possible to
>>>>> ship numpy (via --py-files option maybe) while invoking the
>>>>> spark-submit?
>>>>>
>>>>> Thanks,
>>>>> Avishek
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Re: numpy + pyspark

Posted by Shannon Quinn <sq...@gatech.edu>.
I suppose along those lines, there's also Anaconda: 
https://store.continuum.io/cshop/anaconda/

On 6/27/14, 11:13 AM, Nick Pentreath wrote:
> Hadoopy uses http://www.pyinstaller.org/ to package things up into an 
> executable that should be runnable without root privileges. It says it 
> support numpy
>
>
> On Fri, Jun 27, 2014 at 5:08 PM, Shannon Quinn <squinn@gatech.edu 
> <ma...@gatech.edu>> wrote:
>
>     Would deploying virtualenv on each directory on the cluster be
>     viable? The dependencies would get tricky but I think this is the
>     sort of situation it's built for.
>
>
>     On 6/27/14, 11:06 AM, Avishek Saha wrote:
>>     I too felt the same Nick but I don't have root privileges on the
>>     cluster, unfortunately. Are there any alternatives?
>>
>>
>>     On 27 June 2014 08:04, Nick Pentreath <nick.pentreath@gmail.com
>>     <ma...@gmail.com>> wrote:
>>
>>         I've not tried this - but numpy is a tricky and complex
>>         package with many dependencies on Fortran/C libraries etc.
>>         I'd say by the time you figure out correctly deploying numpy
>>         in this manner, you may as well have just built it into your
>>         cluster bootstrap process, or PSSH install it on each node...
>>
>>
>>         On Fri, Jun 27, 2014 at 4:58 PM, Avishek Saha
>>         <avishek.saha@gmail.com <ma...@gmail.com>> wrote:
>>
>>             To clarify I tried it and it almost worked -- but I am
>>             getting some problems from the Random module in numpy. If
>>             anyone has successfully passed a numpy module (via the
>>             --py-files option) to spark-submit then please let me know.
>>
>>             Thanks !!
>>             Avishek
>>
>>
>>             On 26 June 2014 17:45, Avishek Saha
>>             <avishek.saha@gmail.com <ma...@gmail.com>>
>>             wrote:
>>
>>                 Hi all,
>>
>>                 Instead of installing numpy in each worker node, is
>>                 it possible to
>>                 ship numpy (via --py-files option maybe) while
>>                 invoking the
>>                 spark-submit?
>>
>>                 Thanks,
>>                 Avishek
>>
>>
>>
>>
>
>


Re: numpy + pyspark

Posted by Nick Pentreath <ni...@gmail.com>.
Hadoopy uses http://www.pyinstaller.org/ to package things up into an
executable that should be runnable without root privileges. It says it
support numpy


On Fri, Jun 27, 2014 at 5:08 PM, Shannon Quinn <sq...@gatech.edu> wrote:

>  Would deploying virtualenv on each directory on the cluster be viable?
> The dependencies would get tricky but I think this is the sort of situation
> it's built for.
>
>
> On 6/27/14, 11:06 AM, Avishek Saha wrote:
>
> I too felt the same Nick but I don't have root privileges on the cluster,
> unfortunately. Are there any alternatives?
>
>
> On 27 June 2014 08:04, Nick Pentreath <ni...@gmail.com> wrote:
>
>> I've not tried this - but numpy is a tricky and complex package with many
>> dependencies on Fortran/C libraries etc. I'd say by the time you figure out
>> correctly deploying numpy in this manner, you may as well have just built
>> it into your cluster bootstrap process, or PSSH install it on each node...
>>
>>
>> On Fri, Jun 27, 2014 at 4:58 PM, Avishek Saha <av...@gmail.com>
>> wrote:
>>
>>> To clarify I tried it and it almost worked -- but I am getting some
>>> problems from the Random module in numpy. If anyone has successfully passed
>>> a numpy module (via the --py-files option) to spark-submit then please let
>>> me know.
>>>
>>>  Thanks !!
>>> Avishek
>>>
>>>
>>> On 26 June 2014 17:45, Avishek Saha <av...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Instead of installing numpy in each worker node, is it possible to
>>>> ship numpy (via --py-files option maybe) while invoking the
>>>> spark-submit?
>>>>
>>>> Thanks,
>>>> Avishek
>>>>
>>>
>>>
>>
>
>

Re: numpy + pyspark

Posted by Shannon Quinn <sq...@gatech.edu>.
Would deploying virtualenv on each directory on the cluster be viable? 
The dependencies would get tricky but I think this is the sort of 
situation it's built for.

On 6/27/14, 11:06 AM, Avishek Saha wrote:
> I too felt the same Nick but I don't have root privileges on the 
> cluster, unfortunately. Are there any alternatives?
>
>
> On 27 June 2014 08:04, Nick Pentreath <nick.pentreath@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I've not tried this - but numpy is a tricky and complex package
>     with many dependencies on Fortran/C libraries etc. I'd say by the
>     time you figure out correctly deploying numpy in this manner, you
>     may as well have just built it into your cluster bootstrap
>     process, or PSSH install it on each node...
>
>
>     On Fri, Jun 27, 2014 at 4:58 PM, Avishek Saha
>     <avishek.saha@gmail.com <ma...@gmail.com>> wrote:
>
>         To clarify I tried it and it almost worked -- but I am getting
>         some problems from the Random module in numpy. If anyone has
>         successfully passed a numpy module (via the --py-files option)
>         to spark-submit then please let me know.
>
>         Thanks !!
>         Avishek
>
>
>         On 26 June 2014 17:45, Avishek Saha <avishek.saha@gmail.com
>         <ma...@gmail.com>> wrote:
>
>             Hi all,
>
>             Instead of installing numpy in each worker node, is it
>             possible to
>             ship numpy (via --py-files option maybe) while invoking the
>             spark-submit?
>
>             Thanks,
>             Avishek
>
>
>
>


Re: numpy + pyspark

Posted by Avishek Saha <av...@gmail.com>.
I too felt the same Nick but I don't have root privileges on the cluster,
unfortunately. Are there any alternatives?


On 27 June 2014 08:04, Nick Pentreath <ni...@gmail.com> wrote:

> I've not tried this - but numpy is a tricky and complex package with many
> dependencies on Fortran/C libraries etc. I'd say by the time you figure out
> correctly deploying numpy in this manner, you may as well have just built
> it into your cluster bootstrap process, or PSSH install it on each node...
>
>
> On Fri, Jun 27, 2014 at 4:58 PM, Avishek Saha <av...@gmail.com>
> wrote:
>
>> To clarify I tried it and it almost worked -- but I am getting some
>> problems from the Random module in numpy. If anyone has successfully passed
>> a numpy module (via the --py-files option) to spark-submit then please let
>> me know.
>>
>> Thanks !!
>> Avishek
>>
>>
>> On 26 June 2014 17:45, Avishek Saha <av...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Instead of installing numpy in each worker node, is it possible to
>>> ship numpy (via --py-files option maybe) while invoking the
>>> spark-submit?
>>>
>>> Thanks,
>>> Avishek
>>>
>>
>>
>

Re: numpy + pyspark

Posted by Nick Pentreath <ni...@gmail.com>.
I've not tried this - but numpy is a tricky and complex package with many
dependencies on Fortran/C libraries etc. I'd say by the time you figure out
correctly deploying numpy in this manner, you may as well have just built
it into your cluster bootstrap process, or PSSH install it on each node...


On Fri, Jun 27, 2014 at 4:58 PM, Avishek Saha <av...@gmail.com>
wrote:

> To clarify I tried it and it almost worked -- but I am getting some
> problems from the Random module in numpy. If anyone has successfully passed
> a numpy module (via the --py-files option) to spark-submit then please let
> me know.
>
> Thanks !!
> Avishek
>
>
> On 26 June 2014 17:45, Avishek Saha <av...@gmail.com> wrote:
>
>> Hi all,
>>
>> Instead of installing numpy in each worker node, is it possible to
>> ship numpy (via --py-files option maybe) while invoking the
>> spark-submit?
>>
>> Thanks,
>> Avishek
>>
>
>

Re: numpy + pyspark

Posted by Avishek Saha <av...@gmail.com>.
To clarify I tried it and it almost worked -- but I am getting some
problems from the Random module in numpy. If anyone has successfully passed
a numpy module (via the --py-files option) to spark-submit then please let
me know.

Thanks !!
Avishek


On 26 June 2014 17:45, Avishek Saha <av...@gmail.com> wrote:

> Hi all,
>
> Instead of installing numpy in each worker node, is it possible to
> ship numpy (via --py-files option maybe) while invoking the
> spark-submit?
>
> Thanks,
> Avishek
>