You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Gylfi <gy...@berkeley.edu> on 2015/07/07 04:11:16 UTC
Re: how to black list nodes on the cluster
Hi.
Have you tried to enable speculative execution?
This will allow Spark to run the same sub-task of the job on other available
slots when slow tasks are encountered.
This can be passed at execution time with the params are:
spark.speculation
spark.speculation.interval
spark.speculation.multiplier
spark.speculation.quantile
See https://spark.apache.org/docs/latest/configuration.html under
Scheduling.
Regards,
Gylfi.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-black-list-nodes-on-the-cluster-tp23650p23661.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: how to black list nodes on the cluster
Posted by Gylfi <gy...@berkeley.edu>.
Hi again,
Ok, now I do not know of any way to fix the problem other then delete the
"bad" machine from the config + restart .. And you will need admin
privileges on cluster for that :(
However, before we give up on the speculative execution, I suspect that the
task is being run again and again on the same "faulty" machine because that
is where the data resides.
You could try to store / persist your RDD with MEMORY_ONLY_2 or
MEMORY_AND_DISK_2 as that will force the creation of a replica of the data
on another node. Thus, with two nodes, the scheduler may choose to execute
the speculative task on the second node (I'm not sure about his as I am just
not familiar enough with the Sparks scheduler priorities).
I'm not very hopeful but it may be worth a try (if you have the disk/RAM
space to be able to afford to duplicate all the data that is).
If not, I am afraid I am out of ideas ;)
Regards and good luck,
Gylfi.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-black-list-nodes-on-the-cluster-tp23650p23704.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org