You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jonathan Hung (Jira)" <ji...@apache.org> on 2019/08/27 18:58:00 UTC

[jira] [Reopened] (YARN-7585) NodeManager should go unhealthy when state store throws DBException

     [ https://issues.apache.org/jira/browse/YARN-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hung reopened YARN-7585:
---------------------------------

Reopening this issue to track the backport to branch-2.

Waiting for YARN-8200 merge to backport this (since the fix for this JIRA touches some of that code).

> NodeManager should go unhealthy when state store throws DBException 
> --------------------------------------------------------------------
>
>                 Key: YARN-7585
>                 URL: https://issues.apache.org/jira/browse/YARN-7585
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>              Labels: release-blocker
>             Fix For: 3.1.0
>
>         Attachments: YARN-7585.001.patch, YARN-7585.002.patch, YARN-7585.003.patch
>
>
> If work preserving recover is enabled the NM will not start up if the state store does not initialise. However if the state store becomes unavailable after that for any reason the NM will not go unhealthy. 
> Since the state store is not available new containers can not be started any more and the NM should become unhealthy:
> {code}
> AMLauncher: Error launching appattempt_1508806289867_268617_000001. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: java.io.IOException: org.iq80.leveldb.DBException: IO error: /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: Read-only file system
> at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
> at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721)
> ...
> Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error: /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: Read-only file system
> at o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374)
> at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848)
> at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org