You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/07/28 20:06:38 UTC

[jira] [Commented] (MAPREDUCE-6011) Improve history server behavior during a recovery error

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076493#comment-14076493 ] 

Jason Lowe commented on MAPREDUCE-6011:
---------------------------------------

Sample error where a bad token state failed history server startup but didn't explain which file contained the bad token state:

{noformat}
2014-07-11 22:51:14,977 [main] INFO impl.MetricsSystemImpl: JobHistoryServer metrics system started
2014-07-11 22:51:16,079 [main] INFO hs.HistoryServerFileSystemStateStoreService: Loading history server state from hdfs:/xx
2014-07-11 22:51:46,747 [main] INFO service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer$HistoryServerSecretManagerService failed in state STARTED; cause: java.io.EOFException
java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:267)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier.readFields(AbstractDelegationTokenIdentifier.java:179)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadToken(HistoryServerFileSystemStateStoreService.java:295)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokensFromBucket(HistoryServerFileSystemStateStoreService.java:314)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokens(HistoryServerFileSystemStateStoreService.java:353)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokenState(HistoryServerFileSystemStateStoreService.java:367)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadState(HistoryServerFileSystemStateStoreService.java:114)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer$HistoryServerSecretManagerService.serviceStart(JobHistoryServer.java:89)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:194)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:220)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:229)
2014-07-11 22:51:46,749 [main] INFO service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer failed in state STARTED; cause: org.apache.hadoop.service.ServiceStateException: java.io.EOFException
org.apache.hadoop.service.ServiceStateException: java.io.EOFException
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:194)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:220)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:229)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:267)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier.readFields(AbstractDelegationTokenIdentifier.java:179)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadToken(HistoryServerFileSystemStateStoreService.java:295)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokensFromBucket(HistoryServerFileSystemStateStoreService.java:314)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokens(HistoryServerFileSystemStateStoreService.java:353)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokenState(HistoryServerFileSystemStateStoreService.java:367)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadState(HistoryServerFileSystemStateStoreService.java:114)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer$HistoryServerSecretManagerService.serviceStart(JobHistoryServer.java:89)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        ... 5 more
2014-07-11 22:51:46,750 [main] INFO impl.MetricsSystemImpl: Stopping JobHistoryServer metrics system...
{noformat}

Note the lack of details on which token was being loaded.  Also the log should be at at least at the WARN level if we let the JHS continue past this error or at least the ERROR log level if it remains fatal to starting up.

> Improve history server behavior during a recovery error
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-6011
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6011
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>
> Currently when the history server encounters an error during recovery it is fatal without specific details on the error (e.g. which token was involved during the recovery error).  We should either allow the history server to proceed past recovery errors or provide more specifics on the offending token involved in the fatal error to aid in manual recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)