You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Chris Riccomini <cr...@linkedin.com> on 2012/09/21 21:23:50 UTC

RM with lost NMs results in massive log of AppAttemptId doesnt exist in cache

Hey all,

Is anyone else seeing this issue. It's unclear to me if I'm doing
something wrong, or if something is broken.

Thanks!
Chris

On 9/21/12 11:05 AM, "Chris Riccomini (JIRA)" <ji...@apache.org> wrote:

>Chris Riccomini created MAPREDUCE-4672:
>------------------------------------------
>
>             Summary: RM with lost NMs results in massive log of
>AppAttemptId doesnt exist in cache
>                 Key: MAPREDUCE-4672
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4672
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 0.23.1
>            Reporter: Chris Riccomini
>
>
>Hey Guys,
>
>I'm running a 9 node cluster with 8 NMs and a single RM node. If I run an
>app master and have that app master start a container, then shut down all
>nodes (to simulate a complete failure), the containers timeout and fail,
>as expected.
>
>What's unexpected is that my log then starts filling with:
>
>
>2012-09-21 18:02:02,614 ERROR resourcemanager.ApplicationMasterService
>(ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>in cache appattempt_1348248013002_0001_000001
>2012-09-21 18:02:03,617 ERROR resourcemanager.ApplicationMasterService
>(ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>in cache appattempt_1348248013002_0001_000001
>2012-09-21 18:02:04,618 ERROR resourcemanager.ApplicationMasterService
>(ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>in cache appattempt_1348248013002_0001_000001
>2012-09-21 18:02:05,620 ERROR resourcemanager.ApplicationMasterService
>(ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>in cache appattempt_1348248013002_0001_000001
>2012-09-21 18:02:06,621 ERROR resourcemanager.ApplicationMasterService
>(ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>in cache appattempt_1348248013002_0001_000001
>2012-09-21 18:02:07,623 ERROR resourcemanager.ApplicationMasterService
>(ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>in cache appattempt_1348248013002_0001_000001
>2012-09-21 18:02:08,624 ERROR resourcemanager.ApplicationMasterService
>(ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>in cache appattempt_1348248013002_0001_000001
>
>Is there any way to shut this off/fix it? It just keeps going forever,
>until I bounce the RM node.
>
>Thanks!
>Chris
>
>--
>This message is automatically generated by JIRA.
>If you think it was sent incorrectly, please contact your JIRA
>administrators
>For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: RM with lost NMs results in massive log of AppAttemptId doesnt exist in cache

Posted by Arun C Murthy <ac...@hortonworks.com>.
This looks like a silly bug to me, we should definitely fix the logging - thanks for logging this Chris!

On Sep 21, 2012, at 12:23 PM, Chris Riccomini wrote:

> Hey all,
> 
> Is anyone else seeing this issue. It's unclear to me if I'm doing
> something wrong, or if something is broken.
> 
> Thanks!
> Chris
> 
> On 9/21/12 11:05 AM, "Chris Riccomini (JIRA)" <ji...@apache.org> wrote:
> 
>> Chris Riccomini created MAPREDUCE-4672:
>> ------------------------------------------
>> 
>>            Summary: RM with lost NMs results in massive log of
>> AppAttemptId doesnt exist in cache
>>                Key: MAPREDUCE-4672
>>                URL: https://issues.apache.org/jira/browse/MAPREDUCE-4672
>>            Project: Hadoop Map/Reduce
>>         Issue Type: Bug
>>         Components: resourcemanager
>>   Affects Versions: 0.23.1
>>           Reporter: Chris Riccomini
>> 
>> 
>> Hey Guys,
>> 
>> I'm running a 9 node cluster with 8 NMs and a single RM node. If I run an
>> app master and have that app master start a container, then shut down all
>> nodes (to simulate a complete failure), the containers timeout and fail,
>> as expected.
>> 
>> What's unexpected is that my log then starts filling with:
>> 
>> 
>> 2012-09-21 18:02:02,614 ERROR resourcemanager.ApplicationMasterService
>> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>> in cache appattempt_1348248013002_0001_000001
>> 2012-09-21 18:02:03,617 ERROR resourcemanager.ApplicationMasterService
>> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>> in cache appattempt_1348248013002_0001_000001
>> 2012-09-21 18:02:04,618 ERROR resourcemanager.ApplicationMasterService
>> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>> in cache appattempt_1348248013002_0001_000001
>> 2012-09-21 18:02:05,620 ERROR resourcemanager.ApplicationMasterService
>> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>> in cache appattempt_1348248013002_0001_000001
>> 2012-09-21 18:02:06,621 ERROR resourcemanager.ApplicationMasterService
>> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>> in cache appattempt_1348248013002_0001_000001
>> 2012-09-21 18:02:07,623 ERROR resourcemanager.ApplicationMasterService
>> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>> in cache appattempt_1348248013002_0001_000001
>> 2012-09-21 18:02:08,624 ERROR resourcemanager.ApplicationMasterService
>> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist
>> in cache appattempt_1348248013002_0001_000001
>> 
>> Is there any way to shut this off/fix it? It just keeps going forever,
>> until I bounce the RM node.
>> 
>> Thanks!
>> Chris
>> 
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/