You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chen Jin <ka...@gmail.com> on 2014/01/26 01:28:02 UTC

how to set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES otpimally

Hi all,

>From spark document, we can set the number of workers by
SPARK_WORKER_INSTANCES and the max number of cores that worker can
take by using SPARK_WORKER_CORES, if I have 5 8-core machine, which
one would perform better between
(a)
   SPARK_WORKER_INSTANCES = 8
   SPARK_WORKER_CORES = 1

and
(b)
   SPARK_WORKER_INSTANCES = 1
   SPARK_WORKER_CORES = 8

(a) gives us 40 workers with each core per worker (b) gives 8 workers
while each worker has eight cores. Any advice on which better would
lead to better performance?

Thanks a lot,

-chen

Re: how to set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES otpimally

Posted by Chen Jin <ka...@gmail.com>.
Hi Ankit,

Thanks for detailed explanation. Since my cluster has 5 machines each
of which has 8 cores and 48g memory, I was meant to say for the entire
cluster:

(a) gives us 40 workers with each core per worker (b) gives 5 workers
while each worker has eight cores.

A follow-up question, since each machine has 48g memory,

(a)
   SPARK_WORKER_INSTANCES = 8
   SPARK_WORKER_CORES = 1
   SPARK_WORKER_MEMORY = 6g

(b)
   SPARK_WORKER_INSTANCES = 1
   SPARK_WORKER_CORES = 8
   SPARK_WORKER_MEMORY = 48g

Will (a) setting help consume large dataset, while as you said each
machine has 8 JVMs now?

Thanks a lot,

-chen

On Sun, Jan 26, 2014 at 1:53 AM, Archit Thakur
<ar...@gmail.com> wrote:
> Chen, The first one will launch 8 single threaded JVM's and the 2nd one will
> launch 1 8-threaded JVM.
> Performance depends on your data: If your data size is too small to be
> processed, 2nd one is better because of the launching time of 8 JVM's in
> first case. Also, if you have broadcasted anything, it'll have to that for 8
> machines.
> However, if you have quite big data to be processed, 1st one is better
> because i. In this case you can ignore the launching time of JVM. and ii.
> You'll now have 8 times memory available for processing.
> Assumption made: All machines are equipped with same memory/computing power.
>
>
> """(a) gives us 40 workers with each core per worker (b) gives 8 workers
> while each worker has eight cores. Any advice on which better would
> lead to better performance?"""
>
> No, (a) gives u 8 workers with each core per worker (b) gives 1 worker
>
> while each worker has eight cores.
>
> Let me know, if any doubts.
>
> Thanks and Regards,
> Archit Thakur.
>
>
>
> On Sun, Jan 26, 2014 at 5:58 AM, Chen Jin <ka...@gmail.com> wrote:
>>
>> Hi all,
>>
>> From spark document, we can set the number of workers by
>> SPARK_WORKER_INSTANCES and the max number of cores that worker can
>> take by using SPARK_WORKER_CORES, if I have 5 8-core machine, which
>> one would perform better between
>> (a)
>>    SPARK_WORKER_INSTANCES = 8
>>    SPARK_WORKER_CORES = 1
>>
>> and
>> (b)
>>    SPARK_WORKER_INSTANCES = 1
>>    SPARK_WORKER_CORES = 8
>>
>> (a) gives us 40 workers with each core per worker (b) gives 8 workers
>> while each worker has eight cores. Any advice on which better would
>> lead to better performance?
>>
>> Thanks a lot,
>>
>> -chen
>
>

Re: how to set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES otpimally

Posted by Archit Thakur <ar...@gmail.com>.
Chen, The first one will launch 8 single threaded JVM's and the 2nd one
will launch 1 8-threaded JVM.
Performance depends on your data: If your data size is too small to be
processed, 2nd one is better because of the launching time of 8 JVM's in
first case. Also, if you have broadcasted anything, it'll have to that for
8 machines.
However, if you have quite big data to be processed, 1st one is better
because i. In this case you can ignore the launching time of JVM. and ii.
You'll now have 8 times memory available for processing.
Assumption made: All machines are equipped with same memory/computing power.

"""(a) gives us 40 workers with each core per worker (b) gives 8 workers
while each worker has eight cores. Any advice on which better would
lead to better performance?"""

No, (a) gives u 8 workers with each core per worker (b) gives 1 worker
while each worker has eight cores.

Let me know, if any doubts.

Thanks and Regards,
Archit Thakur.



On Sun, Jan 26, 2014 at 5:58 AM, Chen Jin <ka...@gmail.com> wrote:

> Hi all,
>
> From spark document, we can set the number of workers by
> SPARK_WORKER_INSTANCES and the max number of cores that worker can
> take by using SPARK_WORKER_CORES, if I have 5 8-core machine, which
> one would perform better between
> (a)
>    SPARK_WORKER_INSTANCES = 8
>    SPARK_WORKER_CORES = 1
>
> and
> (b)
>    SPARK_WORKER_INSTANCES = 1
>    SPARK_WORKER_CORES = 8
>
> (a) gives us 40 workers with each core per worker (b) gives 8 workers
> while each worker has eight cores. Any advice on which better would
> lead to better performance?
>
> Thanks a lot,
>
> -chen
>