You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Krishna Kishore Bonagiri <wr...@gmail.com> on 2012/11/05 17:32:31 UTC

Job running on YARN gets automatically killed after 10-12 minutes

Hi,

  My job that is running on YARN framework gets killed automatically after
10-12 minutes.

  I have changed the monitoring time limit Client.java that comes with
distributed shell example, and also bumped values for a set of interval
parameters in $HADOOP_CONF_DIR/yarn-site.xml by 10 fold. Then also the same
kind of error repeats.

Note: I am not sending frequent heartbeats to the RM from AM, also not
sending frequent container requests to RM.

Content from RM's log:
=====================


2012-11-05 05:50:41,721 INFO  fifo.FifoScheduler
(FifoScheduler.java:containerCompleted(721)) - Application
appattempt_1352112580456_0001_000001 released container
container_1352112580456_0001_01_000004 on node: host: isredeng:33055
#containers=2 available=4096 used=4096 with event: FINISHED
2012-11-05 06:03:03,855 INFO  util.AbstractLivelinessMonitor
(AbstractLivelinessMonitor.java:run(111)) -
Expired:appattempt_1352112580456_0001_000001 Timed out after 600 secs
2012-11-05 06:03:03,867 INFO  attempt.RMAppAttemptImpl
(RMAppAttemptImpl.java:handle(483)) - appattempt_1352112580456_0001_000001
State change from RUNNING to FAILED



Content from NM's log:
======================


2012-11-05 06:03:04,364 INFO  containermanager.AuxServices
(AuxServices.java:handle(160)) - Got event APPLICATION_STOP for appId
application_1352112580456_0001
2012-11-05 06:03:04,373 INFO  application.Application
(ApplicationImpl.java:handle(387)) - Application
application_1352112580456_0001 transitioned from
APPLICATION_RESOURCES_CLEANINGUP to FINISHED


Is this behavior not controllable by any of the parameters in XML
configuration files?

Thanks & Regards,
Kishore

Re: Job running on YARN gets automatically killed after 10-12 minutes

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Is this your custom application and not, say, MapReduce or the distributed shell?

If that is the case, the ApplicationMaster needs to constantly ping the ResourceManager so that RM can know that it is alive. This is done by simply doing an allocate(..) call that is part of the scheduler API. This you should do irrespective of whether you have any new container requests or not.

The default liveliness interval is 10 mins, so you are seeing that your app is getting killed roughly after that much time.

HTH,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Nov 5, 2012, at 8:32 AM, Krishna Kishore Bonagiri wrote:

> Hi,
> 
>   My job that is running on YARN framework gets killed automatically after 10-12 minutes.  
> 
>   I have changed the monitoring time limit Client.java that comes with distributed shell example, and also bumped values for a set of interval parameters in $HADOOP_CONF_DIR/yarn-site.xml by 10 fold. Then also the same kind of error repeats.
> 
> Note: I am not sending frequent heartbeats to the RM from AM, also not sending frequent container requests to RM. 
> 
> Content from RM's log:
> =====================
> 
> 
> 2012-11-05 05:50:41,721 INFO  fifo.FifoScheduler (FifoScheduler.java:containerCompleted(721)) - Application appattempt_1352112580456_0001_000001 released container container_1352112580456_0001_01_000004 on node: host: isredeng:33055 #containers=2 available=4096 used=4096 with event: FINISHED
> 2012-11-05 06:03:03,855 INFO  util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(111)) - Expired:appattempt_1352112580456_0001_000001 Timed out after 600 secs
> 2012-11-05 06:03:03,867 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(483)) - appattempt_1352112580456_0001_000001 State change from RUNNING to FAILED
> 
> 
> 
> Content from NM's log:
> ======================
> 
> 
> 2012-11-05 06:03:04,364 INFO  containermanager.AuxServices (AuxServices.java:handle(160)) - Got event APPLICATION_STOP for appId application_1352112580456_0001
> 2012-11-05 06:03:04,373 INFO  application.Application (ApplicationImpl.java:handle(387)) - Application application_1352112580456_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 
> 
> Is this behavior not controllable by any of the parameters in XML configuration files?
> 
> Thanks & Regards,
> Kishore

Re: Job running on YARN gets automatically killed after 10-12 minutes

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Is this your custom application and not, say, MapReduce or the distributed shell?

If that is the case, the ApplicationMaster needs to constantly ping the ResourceManager so that RM can know that it is alive. This is done by simply doing an allocate(..) call that is part of the scheduler API. This you should do irrespective of whether you have any new container requests or not.

The default liveliness interval is 10 mins, so you are seeing that your app is getting killed roughly after that much time.

HTH,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Nov 5, 2012, at 8:32 AM, Krishna Kishore Bonagiri wrote:

> Hi,
> 
>   My job that is running on YARN framework gets killed automatically after 10-12 minutes.  
> 
>   I have changed the monitoring time limit Client.java that comes with distributed shell example, and also bumped values for a set of interval parameters in $HADOOP_CONF_DIR/yarn-site.xml by 10 fold. Then also the same kind of error repeats.
> 
> Note: I am not sending frequent heartbeats to the RM from AM, also not sending frequent container requests to RM. 
> 
> Content from RM's log:
> =====================
> 
> 
> 2012-11-05 05:50:41,721 INFO  fifo.FifoScheduler (FifoScheduler.java:containerCompleted(721)) - Application appattempt_1352112580456_0001_000001 released container container_1352112580456_0001_01_000004 on node: host: isredeng:33055 #containers=2 available=4096 used=4096 with event: FINISHED
> 2012-11-05 06:03:03,855 INFO  util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(111)) - Expired:appattempt_1352112580456_0001_000001 Timed out after 600 secs
> 2012-11-05 06:03:03,867 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(483)) - appattempt_1352112580456_0001_000001 State change from RUNNING to FAILED
> 
> 
> 
> Content from NM's log:
> ======================
> 
> 
> 2012-11-05 06:03:04,364 INFO  containermanager.AuxServices (AuxServices.java:handle(160)) - Got event APPLICATION_STOP for appId application_1352112580456_0001
> 2012-11-05 06:03:04,373 INFO  application.Application (ApplicationImpl.java:handle(387)) - Application application_1352112580456_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 
> 
> Is this behavior not controllable by any of the parameters in XML configuration files?
> 
> Thanks & Regards,
> Kishore

Re: Job running on YARN gets automatically killed after 10-12 minutes

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Is this your custom application and not, say, MapReduce or the distributed shell?

If that is the case, the ApplicationMaster needs to constantly ping the ResourceManager so that RM can know that it is alive. This is done by simply doing an allocate(..) call that is part of the scheduler API. This you should do irrespective of whether you have any new container requests or not.

The default liveliness interval is 10 mins, so you are seeing that your app is getting killed roughly after that much time.

HTH,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Nov 5, 2012, at 8:32 AM, Krishna Kishore Bonagiri wrote:

> Hi,
> 
>   My job that is running on YARN framework gets killed automatically after 10-12 minutes.  
> 
>   I have changed the monitoring time limit Client.java that comes with distributed shell example, and also bumped values for a set of interval parameters in $HADOOP_CONF_DIR/yarn-site.xml by 10 fold. Then also the same kind of error repeats.
> 
> Note: I am not sending frequent heartbeats to the RM from AM, also not sending frequent container requests to RM. 
> 
> Content from RM's log:
> =====================
> 
> 
> 2012-11-05 05:50:41,721 INFO  fifo.FifoScheduler (FifoScheduler.java:containerCompleted(721)) - Application appattempt_1352112580456_0001_000001 released container container_1352112580456_0001_01_000004 on node: host: isredeng:33055 #containers=2 available=4096 used=4096 with event: FINISHED
> 2012-11-05 06:03:03,855 INFO  util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(111)) - Expired:appattempt_1352112580456_0001_000001 Timed out after 600 secs
> 2012-11-05 06:03:03,867 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(483)) - appattempt_1352112580456_0001_000001 State change from RUNNING to FAILED
> 
> 
> 
> Content from NM's log:
> ======================
> 
> 
> 2012-11-05 06:03:04,364 INFO  containermanager.AuxServices (AuxServices.java:handle(160)) - Got event APPLICATION_STOP for appId application_1352112580456_0001
> 2012-11-05 06:03:04,373 INFO  application.Application (ApplicationImpl.java:handle(387)) - Application application_1352112580456_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 
> 
> Is this behavior not controllable by any of the parameters in XML configuration files?
> 
> Thanks & Regards,
> Kishore

Re: Job running on YARN gets automatically killed after 10-12 minutes

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Is this your custom application and not, say, MapReduce or the distributed shell?

If that is the case, the ApplicationMaster needs to constantly ping the ResourceManager so that RM can know that it is alive. This is done by simply doing an allocate(..) call that is part of the scheduler API. This you should do irrespective of whether you have any new container requests or not.

The default liveliness interval is 10 mins, so you are seeing that your app is getting killed roughly after that much time.

HTH,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Nov 5, 2012, at 8:32 AM, Krishna Kishore Bonagiri wrote:

> Hi,
> 
>   My job that is running on YARN framework gets killed automatically after 10-12 minutes.  
> 
>   I have changed the monitoring time limit Client.java that comes with distributed shell example, and also bumped values for a set of interval parameters in $HADOOP_CONF_DIR/yarn-site.xml by 10 fold. Then also the same kind of error repeats.
> 
> Note: I am not sending frequent heartbeats to the RM from AM, also not sending frequent container requests to RM. 
> 
> Content from RM's log:
> =====================
> 
> 
> 2012-11-05 05:50:41,721 INFO  fifo.FifoScheduler (FifoScheduler.java:containerCompleted(721)) - Application appattempt_1352112580456_0001_000001 released container container_1352112580456_0001_01_000004 on node: host: isredeng:33055 #containers=2 available=4096 used=4096 with event: FINISHED
> 2012-11-05 06:03:03,855 INFO  util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(111)) - Expired:appattempt_1352112580456_0001_000001 Timed out after 600 secs
> 2012-11-05 06:03:03,867 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(483)) - appattempt_1352112580456_0001_000001 State change from RUNNING to FAILED
> 
> 
> 
> Content from NM's log:
> ======================
> 
> 
> 2012-11-05 06:03:04,364 INFO  containermanager.AuxServices (AuxServices.java:handle(160)) - Got event APPLICATION_STOP for appId application_1352112580456_0001
> 2012-11-05 06:03:04,373 INFO  application.Application (ApplicationImpl.java:handle(387)) - Application application_1352112580456_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 
> 
> Is this behavior not controllable by any of the parameters in XML configuration files?
> 
> Thanks & Regards,
> Kishore