You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by hantuzun <ma...@hantuzun.com> on 2018/04/08 23:06:46 UTC

[Mesos] How to Disable Blacklisting on Mesos?

Hi all,

Spark currently has blacklisting enabled on Mesos, no matter what:
[SPARK-19755][Mesos] Blacklist is always active for
MesosCoarseGrainedSchedulerBackend

Blacklisting also prevents new drivers from running on our nodes where
previous drivers' had failed tasks.

We've tried restarting Spark dispatcher before sending new tasks. Even
creating new machines (with the same hostname) does not help. 

Looking at  TaskSetBlacklist
<https://github.com/apache/spark/blob/e18d6f5326e0d9ea03d31de5ce04cb84d3b8ab37/core/src/main/scala/org/apache/spark/scheduler/TaskSetBlacklist.scala#L66> 
, I don't understand how a fresh Spark job submitted from a fresh Spark
Dispatcher starts saying all the nodes are blacklisted right away. How does
Spark know previous task failures?

This issue severely interrupts us. How could we disable blacklisting on
Spark 2.3.0? Creative ideas are welcome :)

Best,
Han



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: [Mesos] How to Disable Blacklisting on Mesos?

Posted by "Susan X. Huynh" <xh...@mesosphere.io>.
Hi Han,

You may be seeing the same issue I described here:
https://issues.apache.org/jira/browse/SPARK-22342?focusedCommentId=16411780&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16411780
Do you see "TASK_LOST" in your driver logs? I got past that issue by
updating my version of libmesos (see my second comment in the ticket).

There's also this PR that is in progress:
https://github.com/apache/spark/pull/20640

Susan

On Sun, Apr 8, 2018 at 4:06 PM, hantuzun <ma...@hantuzun.com> wrote:

> Hi all,
>
> Spark currently has blacklisting enabled on Mesos, no matter what:
> [SPARK-19755][Mesos] Blacklist is always active for
> MesosCoarseGrainedSchedulerBackend
>
> Blacklisting also prevents new drivers from running on our nodes where
> previous drivers' had failed tasks.
>
> We've tried restarting Spark dispatcher before sending new tasks. Even
> creating new machines (with the same hostname) does not help.
>
> Looking at  TaskSetBlacklist
> <https://github.com/apache/spark/blob/e18d6f5326e0d9ea03d31de5ce04cb
> 84d3b8ab37/core/src/main/scala/org/apache/spark/
> scheduler/TaskSetBlacklist.scala#L66>
> , I don't understand how a fresh Spark job submitted from a fresh Spark
> Dispatcher starts saying all the nodes are blacklisted right away. How does
> Spark know previous task failures?
>
> This issue severely interrupts us. How could we disable blacklisting on
> Spark 2.3.0? Creative ideas are welcome :)
>
> Best,
> Han
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
Susan X. Huynh
Software engineer, Data Agility
xhuynh@mesosphere.com