You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Manoj Samel <ma...@gmail.com> on 2014/01/24 18:59:49 UTC

Division of work between master, worker, executor and driver

On cluster with HDFS + Spark (in standalone deploy mode), there is a master
node + 4 worker nodes. When a spark-shell connects to master, it creates 4
executor JVMs on each of the 4 worker nodes.

When the application reads a HDFS files and does computations in RDDs, what
work gets done on master, worker, executor and driver  ?

Thanks,

Re: Division of work between master, worker, executor and driver

Posted by Tathagata Das <ta...@gmail.com>.

Master and Worker are components of the Spark's standalone cluster manager,
which manages the available resources in a cluster and divides them between
different Spark applications.
A spark application's Driver asks the Master for resources. Master
allocates certain Workers to the application. Those Worker start running
Executors to do processing for that application. The Driver then launches
tasks directly to those Executors.

This is pictographically explained here
http://spark.incubator.apache.org/docs/latest/cluster-overview.html . Maybe
this will help explain it better.

TD

On Fri, Jan 24, 2014 at 9:59 AM, Manoj Samel <ma...@gmail.com>wrote:

> On cluster with HDFS + Spark (in standalone deploy mode), there is a
> master node + 4 worker nodes. When a spark-shell connects to master, it
> creates 4 executor JVMs on each of the 4 worker nodes.
>
> When the application reads a HDFS files and does computations in RDDs,
> what work gets done on master, worker, executor and driver  ?
>
> Thanks,
>

Re: Division of work between master, worker, executor and driver

Posted by Mark Hamstra <ma...@clearstorydata.com>.

Correct.


On Sun, Jan 26, 2014 at 6:30 PM, Manoj Samel <ma...@gmail.com>wrote:

> Yes, that's what I meant (thanks for the correction).
>
> From the tests run, it seems best is to start workers with default mem (or
> bit higher) and give much more memory/most of the memory to executors;
> since most of the work will be done in executor jvm and the worker jvm
> seems more like node manager for that node.
>
>
> On Sat, Jan 25, 2014 at 6:32 AM, Archit Thakur <ar...@gmail.com>wrote:
>
>>
>>
>>
>> On Fri, Jan 24, 2014 at 11:29 PM, Manoj Samel <ma...@gmail.com>wrote:
>>
>>> On cluster with HDFS + Spark (in standalone deploy mode), there is a
>>> master node + 4 worker nodes. When a spark-shell connects to master, it
>>> creates 4 executor JVMs on each of the 4 worker nodes.
>>>
>>
>> No, It creates 1 (4 in total) executor JVM on each of the 4 worker nodes.
>>
>>>
>>> When the application reads a HDFS files and does computations in RDDs,
>>> what work gets done on master, worker, executor and driver  ?
>>>
>>> Thanks,
>>>
>>
>>
>

Re: Division of work between master, worker, executor and driver

Posted by Manoj Samel <ma...@gmail.com>.

Yes, that's what I meant (thanks for the correction).

>From the tests run, it seems best is to start workers with default mem (or
bit higher) and give much more memory/most of the memory to executors;
since most of the work will be done in executor jvm and the worker jvm
seems more like node manager for that node.


On Sat, Jan 25, 2014 at 6:32 AM, Archit Thakur <ar...@gmail.com>wrote:

>
>
>
> On Fri, Jan 24, 2014 at 11:29 PM, Manoj Samel <ma...@gmail.com>wrote:
>
>> On cluster with HDFS + Spark (in standalone deploy mode), there is a
>> master node + 4 worker nodes. When a spark-shell connects to master, it
>> creates 4 executor JVMs on each of the 4 worker nodes.
>>
>
> No, It creates 1 (4 in total) executor JVM on each of the 4 worker nodes.
>
>>
>> When the application reads a HDFS files and does computations in RDDs,
>> what work gets done on master, worker, executor and driver  ?
>>
>> Thanks,
>>
>
>

Re: Division of work between master, worker, executor and driver

Posted by Archit Thakur <ar...@gmail.com>.

On Fri, Jan 24, 2014 at 11:29 PM, Manoj Samel <ma...@gmail.com>wrote:

> On cluster with HDFS + Spark (in standalone deploy mode), there is a
> master node + 4 worker nodes. When a spark-shell connects to master, it
> creates 4 executor JVMs on each of the 4 worker nodes.
>

No, It creates 1 (4 in total) executor JVM on each of the 4 worker nodes.

>
> When the application reads a HDFS files and does computations in RDDs,
> what work gets done on master, worker, executor and driver  ?
>
> Thanks,
>