You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Tomer Benyamini <to...@gmail.com> on 2015/07/01 08:45:07 UTC

question about resource allocation on the spark standalone cluster

Hello spark-users,

I would like to use the spark standalone cluster for multi-tenants, to run
multiple apps at the same time. The issue is, when submitting an app to the
spark standalone cluster, you cannot pass "--num-executors" like on yarn,
but only "--total-executor-cores". *This may cause starvation when
submitting multiple apps*. Here's an example: Say I have a cluster of 4
machines with 20GB RAM and 4 cores. In case I submit using
--total-executor-cores=4 and --executor-memory=20GB, I may get these 2
extreme resource allocations:
- 4 workers (on 4 machines) with 1 core each, 20GB each, blocking the
entire cluster
- 1 worker (on 1 machine) with 4 cores, 20GB for this machine, leaving 3
free machines to be used by other apps.

Is there a way to restrict / push the standalone cluster towards the 2nd
strategy (use all cores of a given worker before using a second worker)? A
workaround that we did is to set SPARK_WORKER_CORES to
1, SPARK_WORKER_MEMORY to 5gb and SPARK_WORKER_INSTANCES to 4, but this is
suboptimal since it runs 4 worker instances on 1 machine, which has the JVM
overhead, and does not allow to share memory across partitions on the same
worker.

Thanks,
Tomer