You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Rakesh Shah (JIRA)" <ji...@apache.org> on 2019/02/27 05:22:00 UTC

[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.

    [ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778903#comment-16778903 ] 

Rakesh Shah commented on YARN-6054:
-----------------------------------

Hi [~raviprak],

As we can see the backup is taken while the first time when exception occurs, so there is a chance that there may be loss of data.

As it is is trying to repair the level db directory with the backup one.

And the backup one may not have all the files in backup ?.

> TimelineServer fails to start when some LevelDb state files are missing.
> ------------------------------------------------------------------------
>
>                 Key: YARN-6054
>                 URL: https://issues.apache.org/jira/browse/YARN-6054
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>            Priority: Critical
>             Fix For: 2.9.0, 3.0.0-alpha2, 2.8.2
>
>         Attachments: YARN-6054.01.patch, YARN-6054.02.patch, YARN-6054.03.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start because some state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
>         at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
>         at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
>         at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
>         at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
>         at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the TimelineServer should have graceful degradation instead of failing to start at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org