You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Rohith (JIRA)" <ji...@apache.org> on 2015/06/08 13:52:01 UTC
[jira] [Resolved] (YARN-3775) Job does not exit after all node
become unhealthy
[ https://issues.apache.org/jira/browse/YARN-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohith resolved YARN-3775.
--------------------------
Resolution: Not A Problem
Closing as Not A Problem. Please Reopen if you disagree..
> Job does not exit after all node become unhealthy
> -------------------------------------------------
>
> Key: YARN-3775
> URL: https://issues.apache.org/jira/browse/YARN-3775
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.7.1
> Environment: Environment:
> Version : 2.7.0
> OS: RHEL7
> NameNodes: xiachsh11 xiachsh12 (HA enabled)
> DataNodes: 5 xiachsh13-17
> ResourceManage: xiachsh11
> NodeManage: 5 xiachsh13-17
> all nodes are openstack provisioned:
> MEM: 1.5G
> Disk: 16G
> Reporter: Chengshun Xia
> Attachments: logs.tar.gz
>
>
> Running Terasort with data size 10G, all the containers exit since the disk space threshold 0.90 reached,at this point,the job does not exit with error
> 15/06/05 13:13:28 INFO mapreduce.Job: map 9% reduce 0%
> 15/06/05 13:13:52 INFO mapreduce.Job: map 10% reduce 0%
> 15/06/05 13:14:30 INFO mapreduce.Job: map 11% reduce 0%
> 15/06/05 13:15:11 INFO mapreduce.Job: map 12% reduce 0%
> 15/06/05 13:15:43 INFO mapreduce.Job: map 13% reduce 0%
> 15/06/05 13:16:38 INFO mapreduce.Job: map 14% reduce 0%
> 15/06/05 13:16:41 INFO mapreduce.Job: map 15% reduce 0%
> 15/06/05 13:16:53 INFO mapreduce.Job: map 16% reduce 0%
> 15/06/05 13:17:24 INFO mapreduce.Job: map 17% reduce 0%
> 15/06/05 13:17:53 INFO mapreduce.Job: map 18% reduce 0%
> 15/06/05 13:18:36 INFO mapreduce.Job: map 19% reduce 0%
> 15/06/05 13:19:03 INFO mapreduce.Job: map 20% reduce 0%
> 15/06/05 13:19:09 INFO mapreduce.Job: map 15% reduce 0%
> 15/06/05 13:19:32 INFO mapreduce.Job: map 16% reduce 0%
> 15/06/05 13:20:00 INFO mapreduce.Job: map 17% reduce 0%
> 15/06/05 13:20:36 INFO mapreduce.Job: map 18% reduce 0%
> 15/06/05 13:20:57 INFO mapreduce.Job: map 19% reduce 0%
> 15/06/05 13:21:22 INFO mapreduce.Job: map 18% reduce 0%
> 15/06/05 13:21:24 INFO mapreduce.Job: map 14% reduce 0%
> 15/06/05 13:21:25 INFO mapreduce.Job: map 9% reduce 0%
> 15/06/05 13:21:28 INFO mapreduce.Job: map 10% reduce 0%
> 15/06/05 13:22:22 INFO mapreduce.Job: map 11% reduce 0%
> 15/06/05 13:23:06 INFO mapreduce.Job: map 12% reduce 0%
> 15/06/05 13:23:41 INFO mapreduce.Job: map 9% reduce 0%
> 15/06/05 13:23:42 INFO mapreduce.Job: map 5% reduce 0%
> 15/06/05 13:24:38 INFO mapreduce.Job: map 6% reduce 0%
> 15/06/05 13:25:16 INFO mapreduce.Job: map 7% reduce 0%
> 15/06/05 13:25:53 INFO mapreduce.Job: map 8% reduce 0%
> 15/06/05 13:26:35 INFO mapreduce.Job: map 9% reduce 0%
> the last response time is 15/06/05 13:26:35
> and current time :
> [root@xiachsh11 logs]# date
> Fri Jun 5 19:19:59 EDT 2015
> [root@xiachsh11 logs]#
> [root@xiachsh11 logs]# yarn node -list
> 15/06/05 19:20:18 INFO client.RMProxy: Connecting to ResourceManager at xiachsh11.eng.platformlab.ibm.com/9.21.62.234:8032
> Total Nodes:0
> Node-Id Node-State Node-Http-Address Number-of-Running-Containers
> [root@xiachsh11 logs]#
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)