You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Gylfi <gy...@berkeley.edu> on 2015/07/07 04:11:16 UTC

Re: how to black list nodes on the cluster

Hi. 

Have you tried to enable speculative execution? 
This will allow Spark to run the same sub-task of the job on other available
slots when slow tasks are encountered. 

This can be passed at execution time with the params are: 
spark.speculation	
spark.speculation.interval	
spark.speculation.multiplier	
spark.speculation.quantile	

See https://spark.apache.org/docs/latest/configuration.html  under
Scheduling. 

Regards, 
    Gylfi. 




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-black-list-nodes-on-the-cluster-tp23650p23661.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: how to black list nodes on the cluster

Posted by Gylfi <gy...@berkeley.edu>.
Hi again, 

Ok, now I do not know of any way to fix the problem other then delete the
"bad" machine from the config + restart .. And you will need admin
privileges on cluster for that :(

However, before we give up on the speculative execution, I suspect that the
task is being run again and again on the same "faulty" machine because that
is where the data resides.
You could try to store / persist your RDD with MEMORY_ONLY_2 or
MEMORY_AND_DISK_2 as that will force the creation of a replica of the data
on another node. Thus, with two nodes, the scheduler may choose to execute
the speculative task on the second node (I'm not sure about his as I am just
not familiar enough with the Sparks scheduler priorities).
I'm not very hopeful but it may be worth a try (if you have the disk/RAM
space to be able to afford to duplicate all the data that is). 

If not, I am afraid I am out of ideas ;) 

Regards and good luck, 
    Gylfi. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-black-list-nodes-on-the-cluster-tp23650p23704.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org