You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Saisai Shao (JIRA)" <ji...@apache.org> on 2015/12/31 04:50:49 UTC
[jira] [Commented] (SPARK-12516) Properly handle NM failure situation for Spark on Yarn

    [ https://issues.apache.org/jira/browse/SPARK-12516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075693#comment-15075693 ] 

Saisai Shao commented on SPARK-12516:
-------------------------------------

Hi [~vanzin], what is your suggestion of this issue? I'm failed to figure out a proper solution to address this issue.

> Properly handle NM failure situation for Spark on Yarn
> ------------------------------------------------------
>
>                 Key: SPARK-12516
>                 URL: https://issues.apache.org/jira/browse/SPARK-12516
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.6.0
>            Reporter: Saisai Shao
>
> Failure of NodeManager will make all the executors belong to that NM exit silently.
> Currently in the implementation of YarnSchedulerBackend, driver will receive onDisconnect event when executor is lost, which will further ask AM to get the lost reason, AM will hold this query connection until RM report back the status of lost container, and reply back to driver. In the case of NM failure, RM cannot detect this failure immediately until timeout (10 mins by default), so the driver query of lost reason will be timed out (120 seconds), after timed out the executor states in the driver side will be cleaned out, but in the AM side, this states will still be maintained until NM heartbeat timeout. So this will potentially introduce some unexpected behaviors:
> ---
> * In the dynamic allocation disabled situation, executor number in the driver side is less than the number in the AM side after timeout (from 120 seconds to 10 minutes), and cannot be ramped up to the expected number until RM detect the failure of NM and make the related containers as complected.
> {quote}
> For example the target executor number is 10, with 5 NMs (each NM has 2 executors). So when 1 NM is failed, 2 related executors are lost. After driver side query timeout, the executor number in driver side is 8, but in AM side it is still 10, so AM will not request additional containers until the number in AM reaches to 8 (after 10 minutes).
> {quote}
> ---
> * When dynamic allocation is enabled, the number of target executor is maintained both in the driver and AM side and synced between them. The target executor number will be correct after driver query timeout (120 seconds), but this number is incorrect in the AM side until NM failure is detected (10 minutes). In such case the actual executor number is less than the calculated one.
> {quote}
> For example, current target executor number in driver is N, and in AM side is M, so M - N is the lost number.
> When the executor number needs to ramp up to A, so the actual number will be A - (M - N).
> When the executor number needs to bring down to B, so the actual number will be max(0, B - (M - N)). when the actual number of executors is 0, the whole system is hang, will only be recovered if driver request more resources, or after 10 minutes timeout.
> This can be reproduced by running SparkPi example in the yarn-client mode with follow configurations:
> spark.dynamicAllocation.enabled    true
> spark.shuffle.service.enabled      true
> spark.dynamicAllocation.minExecutors 1
> spark.dynamicAllocation.initialExecutors 2
> spark.dynamicAllocation.maxExecutors 3
> In the middle of job, killing one NM which only has executors running.
> {quote}
> ---
> Possbile solutions:
> * Sync the actual executor number from the driver to AM after RPC timeout (120 seconds), also clean the related states in the AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org