You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stavros Kontopoulos (JIRA)" <ji...@apache.org> on 2018/02/22 22:23:00 UTC

[jira] [Comment Edited] (SPARK-23485) Kubernetes should support node blacklist

    [ https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373558#comment-16373558 ] 

Stavros Kontopoulos edited comment on SPARK-23485 at 2/22/18 10:22 PM:
-----------------------------------------------------------------------

When an executor fails all cases are covered via handleDisconnectedExecutors which is scheduled at some rate and then it calls removeExecutor in CoarseGrainedSchedulerBackend which updates blacklist info. When we want to launch new executors, TaskSchedulerImpl will terminate an executor that is already started on a blacklisted node. IMHO kubernetes spark scheduler should fail fast and constraint where pods are launched on (which nodes) as it knows already that some nodes are no option. For example this could be done with: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration.. 

 


was (Author: skonto):
I guess everything is covered via handleDisconnectedExecutors which is scheduled at some rate and then it calls removeExecutor in 

CoarseGrainedSchedulerBackend which updates blacklist info.

 

> Kubernetes should support node blacklist
> ----------------------------------------
>
>                 Key: SPARK-23485
>                 URL: https://issues.apache.org/jira/browse/SPARK-23485
>             Project: Spark
>          Issue Type: New Feature
>          Components: Kubernetes, Scheduler
>    Affects Versions: 2.3.0
>            Reporter: Imran Rashid
>            Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not use for running tasks (eg., because of bad hardware).  When running in yarn, this blacklist is used to avoid ever allocating resources on blacklisted nodes: https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this is incorrect -- but I didn't see any references to {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it seems this is missing.  Thought of this while looking at SPARK-19755, a similar issue on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org