You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/05/15 11:42:04 UTC

[jira] [Commented] (YARN-5924) Resource Manager fails to load state with InvalidProtocolBufferException

    [ https://issues.apache.org/jira/browse/YARN-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010362#comment-16010362 ] 

ASF GitHub Bot commented on YARN-5924:
--------------------------------------

Github user ameks94 commented on the issue:

    https://github.com/apache/hadoop/pull/164
  
    I realized that current solution is not good (to allow RM's launch even with broken app's data).
    It's better to crash RM in case application's file with app's state is broken. This case we can specify more detailed information about which file is broken (Maybe to give the recommendation to remove application's folder with broken data to allow RM to be launched successfully)
    Second, the most important part of the fix should be to find the reason of file's crashing and to find the way to prevent file's crash.


> Resource Manager fails to load state with InvalidProtocolBufferException
> ------------------------------------------------------------------------
>
>                 Key: YARN-5924
>                 URL: https://issues.apache.org/jira/browse/YARN-5924
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Oleksii Dymytrov
>            Assignee: Oleksii Dymytrov
>         Attachments: YARN-5924.002.patch
>
>
> InvalidProtocolBufferException is thrown during recovering of the application's state if application's data has invalid format (or is broken) under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory in HDFS:
> {noformat}
> com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
> 	at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
> 	at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
> 	at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
> 	at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232)
> {noformat}
> The solution can be to catch "InvalidProtocolBufferException", show warning and remove application's folder that contains invalid data to prevent RM restart failure. 
> Additionally, I've added catch for other exceptions that can appear during recovering of the specific application, to avoid RM failure even if the only one application's state can't be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org