You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/03/08 20:47:38 UTC

[jira] [Commented] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly

    [ https://issues.apache.org/jira/browse/YARN-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901945#comment-15901945 ] 

ASF GitHub Bot commented on YARN-6302:
--------------------------------------

GitHub user szegedim opened a pull request:

    https://github.com/apache/hadoop/pull/200

    YARN-6302 Fail the node, if Linux Container Executor is not configured properly

    YARN-6302 Fail the node, if Linux Container Executor is not configured properly

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/szegedim/hadoop YARN-6302

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hadoop/pull/200.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #200
    
----
commit cb97a1911c0df3528c49aa0ba96e7bc6233d630a
Author: Miklos Szegedi <mi...@cloudera.com>
Date:   2017-03-07T22:35:16Z

    YARN-6302 Throw on error 24
    
    Change-Id: Ia676061fd49cc7f54dbd9ae22bb999d4ea8a965b

commit 6f7872e99f5be813c74493dd204e14355049659d
Author: Miklos Szegedi <mi...@cloudera.com>
Date:   2017-03-08T03:37:10Z

    YARN-6302 Shutdown on error 24
    
    Change-Id: Ib17d4a357b6fdf1a6d940f0641770054f1f73e81

commit 03f4cd8a1391360ea3d7790b1044421eb05d6d2d
Author: Miklos Szegedi <mi...@cloudera.com>
Date:   2017-03-08T19:47:03Z

    YARN-6302 Mark node unhealthy on error 24
    
    Change-Id: Ib1e7215f9dac6825bda2eb54707782c59f19eb0c

----


> Fail the node, if Linux Container Executor is not configured properly
> ---------------------------------------------------------------------
>
>                 Key: YARN-6302
>                 URL: https://issues.apache.org/jira/browse/YARN-6302
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Miklos Szegedi
>            Assignee: Miklos Szegedi
>            Priority: Minor
>
> We have a cluster that has one node with misconfigured Linux Container Executor. Every time an AM or regular container is launched on the cluster, it will fail. The node will still have resources available, so it keeps failing apps until the administrator notices the issue and decommissions the node. AM Blacklisting only helps, if the application is already running.
> As a possible improvement, when the LCE is used on the cluster and a NM gets certain errors back from the LCE, like error 24 configuration not found, we should not try to allocate anything on the node anymore or shut down the node entirely. That kind of problem normally does not fix itself and it means that nothing can really run on that node.
> {code}
> Application application_1488920587909_0010 failed 2 times due to AM Container for appattempt_1488920587909_0010_000002 exited with exitCode: -1000
> Failing this attempt.Diagnostics: Application application_1488920587909_0010 initialization failed (exitCode=24) with output:
> For more detailed output, check the application tracking page: http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then click on links to logs of each attempt.
> . Failing the application.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org