You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Miklos Szegedi (JIRA)" <ji...@apache.org> on 2018/03/16 16:22:00 UTC

[jira] [Commented] (YARN-8037) CGroupsResourceCalculator excessive warnings on container relaunch

    [ https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402121#comment-16402121 ] 

Miklos Szegedi commented on YARN-8037:
--------------------------------------

Thank you, [~shanekumpf@gmail.com] for raising this. [~haibochen], we had a logic in one of earlier patches in YARN-7064 to remove multiple reporting of issues from CGroupsResourceCalculator and we removed based on your advice to support debugging. What is your opinion about this suggestion? Do you think we should add back some filtering here? https://issues.apache.org/jira/browse/YARN-7064?focusedCommentId=16323135&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16323135

 

> CGroupsResourceCalculator excessive warnings on container relaunch
> ------------------------------------------------------------------
>
>                 Key: YARN-8037
>                 URL: https://issues.apache.org/jira/browse/YARN-8037
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Shane Kumpf
>            Priority: Major
>
> When a container is relaunched, the old process no longer exists. When using the {{CGroupsResourceCalculator}} this results in the warning and exception below being logged every second until the relaunch occurs, which is excessive and filling up the logs.
> {code:java}
> 2018-03-16 14:30:33,438 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator: Failed to parse 12844
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the interim 12844
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_000002/cpuacct.stat (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more
> 2018-03-16 14:30:33,438 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator: Failed to parse cgroups /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_000002/memory.memsw.usage_in_bytes
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the interim 12844
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_000002/memory.usage_in_bytes (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more{code}
> We should consider moving the exception to debug to reduce the noise at a minimum. Alternatively, it may make sense to stop the existing {{MonitoringThread}} during relaunch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org