You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shangyu Luo <ls...@gmail.com> on 2013/08/02 23:22:04 UTC

Some questions about spark-0.7.0

Hello, I am using spark-0.7.0 python version and I have two questions about
it.
1. Does spark-0.7.0 support nested use of map function and reduce function?
For example, a function is a parameter for a map and this function also
contains a map.
2. Does spark-0.7.0 support setting up one master node and multiple worker
nodes on one computer by using different ports? I can set up one master
node and one worker node on one computer, but when I try to set up another
worker node, there will be conflict of port 8081 error.
Any answers or suggestions will be appreciated.
Thanks.


-- 
--

Shangyu, Luo
Department of Computer Science
Rice University

--
Not Just Think About It, But Do It!
--
Success is never final.
--
Losers always whine about their best

Re: Some questions about spark-0.7.0

Posted by Shangyu Luo <ls...@gmail.com>.
Thanks!


2013/8/2 Ian O'Connell <ia...@ianoconnell.com>

> There's a closure cleaner that attempts to find variables you've
> referenced in a closure it ships around to package them up. those it will
> send a copy of, so you'll see serialization errors if you use things that
> are outside the scope and aren't serializable(IO things often)
>
>
> On Fri, Aug 2, 2013 at 3:45 PM, Shangyu Luo <ls...@gmail.com> wrote:
>
>> Thank you for your answers!
>> Moreover, I am wondering how spark deals with global variables? Will it
>> send a copy of global variables to each worker node?
>> Thanks.
>>
>>
>> 2013/8/2 Ian O'Connell <ia...@ianoconnell.com>
>>
>>> You can run local mode on a single machine, but really to run master and
>>> workers on one machine you'll need to use virtual machines or suffer alot
>>> of hassle. Of course there is no benefit to such approach for performance
>>> just experiment purposes.
>>>
>>> It depends on the maps involved, you cannot have nested RDD's or nested
>>> RDD operations if thats what you mean
>>>
>>>
>>> On Fri, Aug 2, 2013 at 2:22 PM, Shangyu Luo <ls...@gmail.com> wrote:
>>>
>>>> Hello, I am using spark-0.7.0 python version and I have two questions
>>>> about it.
>>>> 1. Does spark-0.7.0 support nested use of map function and reduce
>>>> function? For example, a function is a parameter for a map and this
>>>> function also contains a map.
>>>> 2. Does spark-0.7.0 support setting up one master node and multiple
>>>> worker nodes on one computer by using different ports? I can set up one
>>>> master node and one worker node on one computer, but when I try to set up
>>>> another worker node, there will be conflict of port 8081 error.
>>>> Any answers or suggestions will be appreciated.
>>>> Thanks.
>>>>
>>>>
>>>> --
>>>> --
>>>>
>>>> Shangyu, Luo
>>>> Department of Computer Science
>>>> Rice University
>>>>
>>>> --
>>>> Not Just Think About It, But Do It!
>>>> --
>>>> Success is never final.
>>>> --
>>>> Losers always whine about their best
>>>>
>>>
>>>
>>
>>
>> --
>> --
>>
>> Shangyu, Luo
>> Department of Computer Science
>> Rice University
>>
>> --
>> Not Just Think About It, But Do It!
>> --
>> Success is never final.
>> --
>> Losers always whine about their best
>>
>
>


-- 
--

Shangyu, Luo
Department of Computer Science
Rice University

--
Not Just Think About It, But Do It!
--
Success is never final.
--
Losers always whine about their best

Re: Some questions about spark-0.7.0

Posted by Ian O'Connell <ia...@ianoconnell.com>.
There's a closure cleaner that attempts to find variables you've referenced
in a closure it ships around to package them up. those it will send a copy
of, so you'll see serialization errors if you use things that are outside
the scope and aren't serializable(IO things often)


On Fri, Aug 2, 2013 at 3:45 PM, Shangyu Luo <ls...@gmail.com> wrote:

> Thank you for your answers!
> Moreover, I am wondering how spark deals with global variables? Will it
> send a copy of global variables to each worker node?
> Thanks.
>
>
> 2013/8/2 Ian O'Connell <ia...@ianoconnell.com>
>
>> You can run local mode on a single machine, but really to run master and
>> workers on one machine you'll need to use virtual machines or suffer alot
>> of hassle. Of course there is no benefit to such approach for performance
>> just experiment purposes.
>>
>> It depends on the maps involved, you cannot have nested RDD's or nested
>> RDD operations if thats what you mean
>>
>>
>> On Fri, Aug 2, 2013 at 2:22 PM, Shangyu Luo <ls...@gmail.com> wrote:
>>
>>> Hello, I am using spark-0.7.0 python version and I have two questions
>>> about it.
>>> 1. Does spark-0.7.0 support nested use of map function and reduce
>>> function? For example, a function is a parameter for a map and this
>>> function also contains a map.
>>> 2. Does spark-0.7.0 support setting up one master node and multiple
>>> worker nodes on one computer by using different ports? I can set up one
>>> master node and one worker node on one computer, but when I try to set up
>>> another worker node, there will be conflict of port 8081 error.
>>> Any answers or suggestions will be appreciated.
>>> Thanks.
>>>
>>>
>>> --
>>> --
>>>
>>> Shangyu, Luo
>>> Department of Computer Science
>>> Rice University
>>>
>>> --
>>> Not Just Think About It, But Do It!
>>> --
>>> Success is never final.
>>> --
>>> Losers always whine about their best
>>>
>>
>>
>
>
> --
> --
>
> Shangyu, Luo
> Department of Computer Science
> Rice University
>
> --
> Not Just Think About It, But Do It!
> --
> Success is never final.
> --
> Losers always whine about their best
>

Re: Some questions about spark-0.7.0

Posted by Shangyu Luo <ls...@gmail.com>.
Thank you for your answers!
Moreover, I am wondering how spark deals with global variables? Will it
send a copy of global variables to each worker node?
Thanks.


2013/8/2 Ian O'Connell <ia...@ianoconnell.com>

> You can run local mode on a single machine, but really to run master and
> workers on one machine you'll need to use virtual machines or suffer alot
> of hassle. Of course there is no benefit to such approach for performance
> just experiment purposes.
>
> It depends on the maps involved, you cannot have nested RDD's or nested
> RDD operations if thats what you mean
>
>
> On Fri, Aug 2, 2013 at 2:22 PM, Shangyu Luo <ls...@gmail.com> wrote:
>
>> Hello, I am using spark-0.7.0 python version and I have two questions
>> about it.
>> 1. Does spark-0.7.0 support nested use of map function and reduce
>> function? For example, a function is a parameter for a map and this
>> function also contains a map.
>> 2. Does spark-0.7.0 support setting up one master node and multiple
>> worker nodes on one computer by using different ports? I can set up one
>> master node and one worker node on one computer, but when I try to set up
>> another worker node, there will be conflict of port 8081 error.
>> Any answers or suggestions will be appreciated.
>> Thanks.
>>
>>
>> --
>> --
>>
>> Shangyu, Luo
>> Department of Computer Science
>> Rice University
>>
>> --
>> Not Just Think About It, But Do It!
>> --
>> Success is never final.
>> --
>> Losers always whine about their best
>>
>
>


-- 
--

Shangyu, Luo
Department of Computer Science
Rice University

--
Not Just Think About It, But Do It!
--
Success is never final.
--
Losers always whine about their best

Re: Some questions about spark-0.7.0

Posted by Ian O'Connell <ia...@ianoconnell.com>.
You can run local mode on a single machine, but really to run master and
workers on one machine you'll need to use virtual machines or suffer alot
of hassle. Of course there is no benefit to such approach for performance
just experiment purposes.

It depends on the maps involved, you cannot have nested RDD's or nested RDD
operations if thats what you mean

On Fri, Aug 2, 2013 at 2:22 PM, Shangyu Luo <ls...@gmail.com> wrote:

> Hello, I am using spark-0.7.0 python version and I have two questions
> about it.
> 1. Does spark-0.7.0 support nested use of map function and reduce
> function? For example, a function is a parameter for a map and this
> function also contains a map.
> 2. Does spark-0.7.0 support setting up one master node and multiple worker
> nodes on one computer by using different ports? I can set up one master
> node and one worker node on one computer, but when I try to set up another
> worker node, there will be conflict of port 8081 error.
> Any answers or suggestions will be appreciated.
> Thanks.
>
>
> --
> --
>
> Shangyu, Luo
> Department of Computer Science
> Rice University
>
> --
> Not Just Think About It, But Do It!
> --
> Success is never final.
> --
> Losers always whine about their best
>