You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by YouPeng Yang <yy...@gmail.com> on 2013/02/02 15:43:45 UTC

Re: YARN NM containers were killed

Hi All
      I am sorry to bother you guys, but i have to  put up the problem
againt .
I do want to get clear why the some  containers  were killed.

the details about this situation are descriped in my  mail I've posted few
days ago.
My questions:
               1. Why  were 2 containers created in Hadoop02,however
Hadoop04 got nothing.is it normal ?
2. What is the principle that guides containers to be created.
3. Why were the two containers (the container_*_000003 and the
container_*_000002)  killed, while the container_*_000001 succeeded.
   is it normal?



   Any suggestion will be appreciated.


regards
YouPeng Yang


2013/1/31 YouPeng Yang <yy...@gmail.com>

> Hi
>
>    I have posted my question for a day,please can somebody help me to
> figure  out
> what the problem is.
>    Thank you.
> regards
> YouPeng Yang
>
>
> ---------- Forwarded message ----------
> From: YouPeng Yang <yy...@gmail.com>
> Date: 2013/1/30
> Subject: YARN NM containers were killed
> To: user@hadoop.apache.org
>
>
> i've tested the hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar on my hadoop
> environment
> (   1 RM - Hadoop01 and 3 NM --Hadoop02,Hadoop03,Hadoop04
>   OS:CDH4.1.2 rhel5.5):
> ./bin/hadoop jar
> share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar
>  wordcount 1/input output
>
> when i checked the log .i was confused by the plz:
> my hadoop creates 2 containers in Hadoop02,1 container in Hadoop03
> ,however 0 container Hadoop04.
>
> the result of the containers processing:
>
> Hadoop02:
> * container_1359422495723_0001_01_000001
> (its state changes as follows:NEW --> LOCALIZING --> LOCALIZED --> RUNNING
> --> KILLING --> EXITED_WITH_SUCCESS)
>
>       the log indates that:
> NodeStatusUpdaterImpl: Sending out status for container: container_id {,
> app_attempt_id {, application_id {, id: 1, cluster_timestamp:
> 1359422495723, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics:
> "", exit_status: -1000,
>  ContainerLaunch: Container container_1359422495723_0001_01_000001
> succeeded
> Container: Container container_1359422495723_0001_01_000001 transitioned
> from RUNNING to EXITED_WITH_SUCCESS
>  ContainerLaunch: Cleaning up container
> container_1359422495723_0001_01_000001
> NMAuditLogger: USER=hadoop OPERATION=Container Finished - Succeeded
> TARGET=ContainerImpl RESULT=SUCCESSAPPID=application_1359422495723_0001
> CONTAINERID=container_1359422495723_0001_01_000001
>  * container_1359422495723_0001_01_000003
> (its state changes as follows:NEW --> LOCALIZING --> LOCALIZED --> RUNNING
> --> KILLING --> CONTAINER_CLEANEDUP_AFTER_KILL--> DONE)
>  the log indates that:
> NodeStatusUpdaterImpl: Sending out status for container: container_id {,
> app_attempt_id {, application_id {, id: 1, cluster_timestamp:
> 1359422495723, }, attemptId: 1, }, id: 3, }, state: C_RUNNING, diagnostics:
> "Container killed by the ApplicationMaster.\n", exit_status: -1000,
>  DefaultContainerExecutor: Exit code from task is : 137
> NMAuditLogger: USER=hadoop OPERATION=Container Finished - Killed
> TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1359422495723_0001
> CONTAINERID=container_1359422495723_0001_01_000003
>
> Hadoop03:
>         * container_1359422495723_0001_01_000002
> (its state changes as follows:NEW --> LOCALIZING --> LOCALIZED --> RUNNING
> --> KILLING --> CONTAINER_CLEANEDUP_AFTER_KILL--> DONE)
>  NodeStatusUpdaterImpl: Sending out status for container: container_id {,
> app_attempt_id {, application_id {, id: 1, cluster_timestamp:
> 1359422495723, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics:
> "Container killed by the ApplicationMaster.\n", exit_status: -1000,
>         DefaultContainerExecutor: Exit code from task is : 143
> NMAuditLogger: USER=hadoop OPERATION=Container Finished - Killed
> TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1359422495723_0001
> CONTAINERID=container_1359422495723_0001_01_000002
>
> My questions:
>         1. Why  were 2 containers created in Hadoop02,however Hadoop04 got
> nothing.is it normal ?
> 2. What is the principle that guides containers to be created.
>  3. Why were the two containers (the container_*_000003 and the
> container_*_000002)  killed, while the container_*_000001 succeeded.
>    is it normal?
>
>
> logs of Hadoop01 as follows:
>
> 2013-01-29 09:23:48,904 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
> new applicationId: 1
> 2013-01-29 09:23:50,201 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
> with id 1 submitted by user hadoop
> 2013-01-29 09:23:50,204 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop
> IP=10.167.14.221 OPERATION=Submit Application Request
> TARGET=ClientRMServiceRESULT=SUCCESS APPID=application_1359422495723_0001
> 2013-01-29 09:23:50,221 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1359422495723_0001 State change from NEW to SUBMITTED
> 2013-01-29 09:23:50,221 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> Registering appattempt_1359422495723_0001_000001
> 2013-01-29 09:23:50,222 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1359422495723_0001_000001 State change from NEW to SUBMITTED
> 2013-01-29 09:23:50,242 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
> Application Submission: application_1359422495723_0001 from hadoop,
> currently active: 1
> 2013-01-29 09:23:50,250 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1359422495723_0001_000001 State change from SUBMITTED to
> SCHEDULED
> 2013-01-29 09:23:50,250 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1359422495723_0001 State change from SUBMITTED to ACCEPTED
> 2013-01-29 09:23:50,581 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000001 Container Transitioned from NEW to
> ALLOCATED
> 2013-01-29 09:23:50,581 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=AM
> Allocated Container TARGET=SchedulerApp RESULT=SUCCESS
> APPID=application_1359422495723_0001
> CONTAINERID=container_1359422495723_0001_01_000001
> 2013-01-29 09:23:50,581 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
> Assigned container container_1359422495723_0001_01_000001 of capacity
> memory: 1536 on host Hadoop02:39876, which currently has 1 containers,
> memory: 1536 used and memory: 6656 available
> 2013-01-29 09:23:50,582 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000001 Container Transitioned from
> ALLOCATED to ACQUIRED
> 2013-01-29 09:23:50,583 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1359422495723_0001_000001 State change from SCHEDULED to
> ALLOCATED
> 2013-01-29 09:23:50,587 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> Launching masterappattempt_1359422495723_0001_000001
> 2013-01-29 09:23:50,606 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> Setting up container Container: [ContainerId:
> container_1359422495723_0001_01_000001, NodeId: Hadoop02:39876,
> NodeHttpAddress: Hadoop02:8042, Resource: memory: 1536, Priority:
> org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl@1f, State: NEW,
> Token: null, Status: container_id {, app_attempt_id {, application_id {,
> id: 1, cluster_timestamp: 1359422495723, }, attemptId: 1, }, id: 1, },
> state: C_NEW, ] for AM appattempt_1359422495723_0001_000001
> 2013-01-29 09:23:50,606 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> Command to launch container container_1359422495723_0001_01_000001 :
> $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.mapreduce.container.log.dir=<LOG_DIR>
> -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
> -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
> 2><LOG_DIR>/stderr
> 2013-01-29 09:23:51,030 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done
> launching container Container: [ContainerId:
> container_1359422495723_0001_01_000001, NodeId: Hadoop02:39876,
> NodeHttpAddress: Hadoop02:8042, Resource: memory: 1536, Priority:
> org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl@1f, State: NEW,
> Token: null, Status: container_id {, app_attempt_id {, application_id {,
> id: 1, cluster_timestamp: 1359422495723, }, attemptId: 1, }, id: 1, },
> state: C_NEW, ] for AM appattempt_1359422495723_0001_000001
> 2013-01-29 09:23:51,030 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1359422495723_0001_000001 State change from ALLOCATED to LAUNCHED
> 2013-01-29 09:23:51,575 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000001 Container Transitioned from ACQUIRED
> to RUNNING
> 2013-01-29 09:23:57,108 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM
> registration appattempt_1359422495723_0001_000001
> 2013-01-29 09:23:57,109 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop
> IP=10.167.14.222 OPERATION=Register App Master
> TARGET=ApplicationMasterServicRESULT=SUCCESS
> APPID=application_1359422495723_0001
> APPATTEMPTID=appattempt_1359422495723_0001_000001
> 2013-01-29 09:23:57,109 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1359422495723_0001_000001 State change from LAUNCHED to RUNNING
> 2013-01-29 09:23:57,109 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1359422495723_0001 State change from ACCEPTED to RUNNING
> 2013-01-29 09:23:58,616 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000002 Container Transitioned from NEW to
> ALLOCATED
> 2013-01-29 09:23:58,616 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=AM
> Allocated Container TARGET=SchedulerApp RESULT=SUCCESS
> APPID=application_1359422495723_0001
> CONTAINERID=container_1359422495723_0001_01_000002
> 2013-01-29 09:23:58,616 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
> Assigned container container_1359422495723_0001_01_000002 of capacity
> memory: 1024 on host Hadoop03:39387, which currently has 1 containers,
> memory: 1024 used and memory: 7168 available
> 2013-01-29 09:23:59,168 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000002 Container Transitioned from
> ALLOCATED to ACQUIRED
> 2013-01-29 09:24:00,646 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000003 Container Transitioned from NEW to
> ALLOCATED
> 2013-01-29 09:24:00,646 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=AM
> Allocated Container TARGET=SchedulerApp RESULT=SUCCESS
> APPID=application_1359422495723_0001
> CONTAINERID=container_1359422495723_0001_01_000003
> 2013-01-29 09:24:00,646 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
> Assigned container container_1359422495723_0001_01_000003 of capacity
> memory: 1024 on host Hadoop02:39876, which currently has 2 containers,
> memory: 2560 used and memory: 5632 available
> 2013-01-29 09:24:00,659 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000002 Container Transitioned from ACQUIRED
> to RUNNING
> 2013-01-29 09:24:01,196 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000003 Container Transitioned from
> ALLOCATED to ACQUIRED
> 2013-01-29 09:24:01,657 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000003 Container Transitioned from ACQUIRED
> to RUNNING
> 2013-01-29 09:24:05,674 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000002 Container Transitioned from RUNNING
> to COMPLETED
> 2013-01-29 09:24:05,674 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
> Completed container: container_1359422495723_0001_01_000002 in state:
> COMPLETED event:FINISHED
>  2013-01-29 09:24:05,674 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=AM
> Released Container TARGET=SchedulerApp RESULT=SUCCESS
> APPID=application_1359422495723_0001
> CONTAINERID=container_1359422495723_0001_01_000002
> 2013-01-29 09:24:05,674 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
> Released container container_1359422495723_0001_01_000002 of capacity
> memory: 1024 on host Hadoop03:39387, which currently has 0 containers,
> memory: 0 used and memory: 8192 available, release resources=true
> 2013-01-29 09:24:05,674 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
> Application appattempt_1359422495723_0001_000001 released container
> container_1359422495723_0001_01_000002 on node: host: Hadoop03:39387
> #containers=0 available=8192 used=0 with event: FINISHED
> 2013-01-29 09:24:07,524 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000003 Container Transitioned from RUNNING
> to COMPLETED
> 2013-01-29 09:24:07,524 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
> Completed container: container_1359422495723_0001_01_000003 in state:
> COMPLETED event:FINISHED
> 2013-01-29 09:24:07,524 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=AM
> Released Container TARGET=SchedulerApp RESULT=SUCCESS
> APPID=application_1359422495723_0001
> CONTAINERID=container_1359422495723_0001_01_000003
> 2013-01-29 09:24:07,524 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
> Released container container_1359422495723_0001_01_000003 of capacity
> memory: 1024 on host Hadoop02:39876, which currently has 1 containers,
> memory: 1536 used and memory: 6656 available, release resources=true
> 2013-01-29 09:24:07,525 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
> Application appattempt_1359422495723_0001_000001 released container
> container_1359422495723_0001_01_000003 on node: host: Hadoop02:39876
> #containers=1 available=6656 used=1536 with event: FINISHED
> 2013-01-29 09:24:11,597 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1359422495723_0001_000001 State change from RUNNING to FINISHING
> 2013-01-29 09:24:11,597 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1359422495723_0001 State change from RUNNING to FINISHING
> 2013-01-29 09:24:12,554 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1359422495723_0001_01_000001 Container Transitioned from RUNNING
> to COMPLETED
> 2013-01-29 09:24:12,554 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
> Completed container: container_1359422495723_0001_01_000001 in state:
> COMPLETED event:FINISHED
> 2013-01-29 09:24:12,554 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=AM
> Released Container TARGET=SchedulerApp RESULT=SUCCESS
> APPID=application_1359422495723_0001
> CONTAINERID=container_1359422495723_0001_01_000001
> 2013-01-29 09:24:12,555 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
> Released container container_1359422495723_0001_01_000001 of capacity
> memory: 1536 on host Hadoop02:39876, which currently has 0 containers,
> memory: 0 used and memory: 8192 available, release resources=true
> 2013-01-29 09:24:12,555 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
> Application appattempt_1359422495723_0001_000001 released container
> container_1359422495723_0001_01_000001 on node: host: Hadoop02:39876
> #containers=0 available=8192 used=0 with event: FINISHED
> 2013-01-29 09:24:12,556 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1359422495723_0001_000001 State change from FINISHING to FINISHED
> 2013-01-29 09:24:12,557 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1359422495723_0001 State change from FINISHING to FINISHED
> 2013-01-29 09:24:12,558 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=Application
> Finished - Succeeded TARGET=RMAppManager
> RESULT=SUCCESSAPPID=application_1359422495723_0001
> 2013-01-29 09:24:12,558 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
> Application application_1359422495723_0001 requests cleared
> 2013-01-29 09:24:12,560 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> Cleaning master appattempt_1359422495723_0001_000001
> 2013-01-29 09:24:12,560 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary:
> appId=application_1359422495723_0001,name=word
> count,user=hadoop,queue=default,state=FINISHED,trackingUrl=Hadoop01:8088/proxy/application_1359422495723_0001/jobhistory/job/job_1359422495723_0001,appMasterHost=Hadoop02,startTime=1359422630195,finishTime=1359422651597
>
>
>
>