You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "YI, XIAOCHUAN" <xy...@att.com> on 2015/10/30 22:11:16 UTC

RE: Spark tunning increase number of active tasks

HI
Our team has a 40 node hortonworks Hadoop cluster 2.2.4.2-2  (36 data node) with apache spark 1.2 and 1.4 installed.
Each node has 64G RAM and 8 cores.

We are only able to use <= 72 executors with executor-cores=2
So we are only get 144 active tasks running pyspark programs with pyspark.
[Stage 1:===============>                                    (596 + 144) / 2042]
IF we use larger number for --num-executors, the pyspark program exit with errors:
ERROR YarnScheduler: Lost executor 113 on hag017.example.com: remote Rpc client disassociated

I tried spark 1.4 and conf.set("dynamicAllocation.enabled", "true"). However it does not help us to increase the number of active tasks.
I expect larger number of active tasks with the cluster we have.
Could anyone advise on this? Thank you very much!

Shaun