You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mikhail Strebkov <st...@gmail.com> on 2016/01/25 22:57:24 UTC

Standalone scheduler issue - one job occupies the whole cluster somehow

Hi all,

Recently we started having issues with one of our background processing
scripts which we run on Spark. The cluster runs only two jobs. One job runs
for days, and another is usually like a couple of hours. Both jobs have a
crob schedule. The cluster is small, just 2 slaves, 24 cores, 25.4 GB of
memory. Each job takes 6 cores and 6 GB per worker. So when both jobs are
running it's 12 cores out of 24 cores and 24 GB out of 25.4 GB. But
sometimes I see this:

https://www.dropbox.com/s/6uad4hrchqpihp4/Screen%20Shot%202016-01-25%20at%201.16.19%20PM.png

So basically the long running job somehow occupied the whole cluster and
the fast one can't make any progress because the cluster doesn't have
resources. That's what I see in the logs:

16/01/25 21:26:48 WARN TaskSchedulerImpl: Initial job has not accepted any
> resources; check your cluster UI to ensure that workers are registered and
> have sufficient resources


When I log in to the slaves I see this:

slave 1:

> /usr/lib/jvm/java/bin/java -cp <some_jars> -Xms6144M -Xmx6144M
> -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler
> --executor-id 450 --hostname 10.191.4.151 *--cores 1 --app-id
> app-20160124152439-1468* --worker-url akka.tcp://
> sparkWorker@10.191.4.151:53144/user/Worker
> /usr/lib/jvm/java/bin/java -cp -cp <some_jars> -Xms6144M -Xmx6144M
> -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler
> --executor-id 451 --hostname 10.191.4.151 *--cores 1 --app-id
> app-20160124152439-1468* --worker-url akka.tcp://
> sparkWorker@10.191.4.151:53144/user/Worker


slave 2:

> /usr/lib/jvm/java/bin/java -cp <some_jars> -Xms6144M -Xmx6144M
> -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler
> --executor-id 1 --hostname 10.253.142.59 *--cores 3 --app-id
> app-20160124152439-1468* --worker-url akka.tcp://
> sparkWorker@10.253.142.59:33265/user/Worker
> /usr/lib/jvm/java/bin/java -cp -cp <some_jars> -Xms6144M -Xmx6144M
> -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler
> --executor-id 448 --hostname 10.253.142.59 *--cores 1 --app-id
> app-20160124152439-1468* --worker-url akka.tcp://
> sparkWorker@10.253.142.59:33265/user/Worker


so somehow Spark created 4 executors, 2 on each machine, 1 core + 1 core
and 3 cores + 1 core to get the total of 6 cores. But because 6 GB setting
is per executor, it ends up occupying 24 GB instead of 12 GB (2 executors,
3 cores + 3 cores) and blocks the other Spark job.

My wild guess is that for some reason 1 executor of the long job failed, so
the job becomes 3 cores short and asks the scheduler if it can get 3 more
cores, then the scheduler distributes it evenly across the slaves: 2 cores
+ 1 core but this distribution doesn't work until the short job finishes
(because the shor job holds the rest of the memory). This explains 3 + 1 on
one slave but doesn't explain 1 + 1 on another.

Did anyone experience anything similar to this? Any ideas how to avoid it?

Thanks,
Mikhail