You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Xiaoye Sun <su...@gmail.com> on 2016/10/11 20:49:42 UTC

one executor runs multiple parallel tasks VS multiple excutors each runs one task

Hi,

Currently, I am running Spark using the standalone scheduler with 3
machines in our cluster. For these three machines, one runs Spark Master
and the other two run Spark Worker.

We run a machine learning application on this small-scale testbed. A
particular stage in my application is divided into 10 parallel tasks. So I
want to know the pros and cons for different cluster configurations.

Conf 1: Multiple executors each of which runs one task.
Each worker has 5 executors; each of the executors has 1 CPU core. In such
configuration, the scheduler will give one task to each of the executors.
Each of the tasks probably runs in different JVMs.

Conf 2: One executor running multiple tasks.
Each worker has only one executor; each executor has 5 CPU cores. In such
case, the scheduler will give 5 tasks to each executor. Tasks running in
the same executor probably run in the same process but different threads.

I think in many cases, Conf 2 is preferable than Conf 1 since the tasks in
the same executor can share the block manager so data shared among these
tasks doesn't need to be transferred multiple times (e.g. the broadcast
data). However, I am wondering if there is a scenario where Conf 1 is
preferable and does the same conclusion hold when the scheduler is YARN or
Mesos.

Thanks!

Best,
Xiaoye

Re: one executor runs multiple parallel tasks VS multiple excutors each runs one task

Posted by Denis Bolshakov <bo...@gmail.com>.
Look here
http://www.slideshare.net/cloudera/top-5-mistakes-to-avoid-when-writing-apache-spark-applications

Probably it will help a bit.

Best regards,
Denis

11 Окт 2016 г. 23:49 пользователь "Xiaoye Sun" <su...@gmail.com>
написал:

> Hi,
>
> Currently, I am running Spark using the standalone scheduler with 3
> machines in our cluster. For these three machines, one runs Spark Master
> and the other two run Spark Worker.
>
> We run a machine learning application on this small-scale testbed. A
> particular stage in my application is divided into 10 parallel tasks. So I
> want to know the pros and cons for different cluster configurations.
>
> Conf 1: Multiple executors each of which runs one task.
> Each worker has 5 executors; each of the executors has 1 CPU core. In such
> configuration, the scheduler will give one task to each of the executors.
> Each of the tasks probably runs in different JVMs.
>
> Conf 2: One executor running multiple tasks.
> Each worker has only one executor; each executor has 5 CPU cores. In such
> case, the scheduler will give 5 tasks to each executor. Tasks running in
> the same executor probably run in the same process but different threads.
>
> I think in many cases, Conf 2 is preferable than Conf 1 since the tasks in
> the same executor can share the block manager so data shared among these
> tasks doesn't need to be transferred multiple times (e.g. the broadcast
> data). However, I am wondering if there is a scenario where Conf 1 is
> preferable and does the same conclusion hold when the scheduler is YARN or
> Mesos.
>
> Thanks!
>
> Best,
> Xiaoye
>
>