You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Abhishek Das <ab...@gmail.com> on 2017/03/02 23:59:04 UTC

LevelDB corruption in YARN Application TimelineServer

Hi,

I am running a hadoop 2.6.0 cluster in ec2 instances with r3.2xlarge as
instance of the master node. YARN Application TimelineServer running in the
master node is throwing an exception because of leveldb corruption. This
issue seems to be happening when the cluster has been up for a long time
(more than 7 days). The stack trace is given below.

ERROR org.apache.hadoop.yarn.server.timeline.TimelineDataManager: Skip the
timeline entity: { id: <task_id>, type: TEZ_TASK_ID }
java.lang.RuntimeException:
org.fusesource.leveldbjni.internal.NativeDB$DBException: *IO error:
/media/ephemeral0/hadoop-root/yarn/timeline/leveldb-timeline-store.ldb/330951.sst:
No such file or directory*
        at
org.fusesource.leveldbjni.internal.JniDBIterator.seek(JniDBIterator.java:68)
        at
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getEntity(LeveldbTimelineStore.java:444)
        at
org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(TimelineDataManager.java:257)
        at
org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.postEntities(TimelineWebServices.java:259)
        at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
        at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
        at
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
        at
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
        at
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
        at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
        at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
        at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
        at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
        at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
        at
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
        at
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
        at
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
        at
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
        at
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at
com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(CrossOriginFilter.java:95)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
        at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269)
        at
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1242)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)

There are lot of  .sst files in the level db directory.
*sudo ls -lrt
/media/ephemeral0/hadoop-root/yarn/timeline/leveldb-timeline-store.ldb/ |
wc -l*
*3848*

After this error the ResourceManager and Tez ApplicationMaster are not able
to post entities in the YARN ATS. So not able to see the history of the
running jobs.

Does anyone have any idea what is the root cause of this leveldb corruption
and how to get rid off this issue.

Thanks in advance.

Regards,
Abhishek

Re: LevelDB corruption in YARN Application TimelineServer

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Abhishek!

You might also want to pull in
https://issues.apache.org/jira/browse/YARN-6054 .

HTH
Ravi

On Mon, Mar 6, 2017 at 8:39 AM, Jason Lowe <jl...@yahoo-inc.com.invalid>
wrote:

> Verify that something outside of Hadoop/YARN is not coming along
> periodically and removing "old" files (e.g.: tmpwatch, etc.).  Users have
> reported similar cases in the past that were tracked down to an invalid
> setup.  State was being corrupted by a periodic cleanup tool, like
> tmpwatch, removing files.
> Jason
>
>
>     On Thursday, March 2, 2017 5:59 PM, Abhishek Das <
> abhishek.besu@gmail.com> wrote:
>
>
>  Hi,
>
> I am running a hadoop 2.6.0 cluster in ec2 instances with r3.2xlarge as
> instance of the master node. YARN Application TimelineServer running in the
> master node is throwing an exception because of leveldb corruption. This
> issue seems to be happening when the cluster has been up for a long time
> (more than 7 days). The stack trace is given below.
>
> ERROR org.apache.hadoop.yarn.server.timeline.TimelineDataManager: Skip the
> timeline entity: { id: <task_id>, type: TEZ_TASK_ID }
> java.lang.RuntimeException:
> org.fusesource.leveldbjni.internal.NativeDB$DBException: *IO error:
> /media/ephemeral0/hadoop-root/yarn/timeline/leveldb-
> timeline-store.ldb/330951.sst:
> No such file or directory*
>         at
> org.fusesource.leveldbjni.internal.JniDBIterator.seek(
> JniDBIterator.java:68)
>         at
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getEntity(
> LeveldbTimelineStore.java:444)
>         at
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(
> TimelineDataManager.java:257)
>         at
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.
> postEntities(TimelineWebServices.java:259)
>         at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(
> JavaMethodInvokerFactory.java:60)
>         at
> com.sun.jersey.server.impl.model.method.dispatch.
> AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(
> AbstractResourceMethodDispatchProvider.java:185)
>         at
> com.sun.jersey.server.impl.model.method.dispatch.
> ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.
> java:75)
>         at
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.
> accept(HttpMethodRule.java:288)
>         at
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.
> accept(ResourceClassRule.java:108)
>         at
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.
> accept(RightHandPathRule.java:147)
>         at
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(
> RootResourceClassesRule.java:84)
>         at
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(
> WebApplicationImpl.java:1469)
>         at
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(
> WebApplicationImpl.java:1400)
>         at
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(
> WebApplicationImpl.java:1349)
>         at
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(
> WebApplicationImpl.java:1339)
>         at
> com.sun.jersey.spi.container.servlet.WebComponent.service(
> WebComponent.java:416)
>         at
> com.sun.jersey.spi.container.servlet.ServletContainer.
> service(ServletContainer.java:537)
>         at
> com.sun.jersey.spi.container.servlet.ServletContainer.
> doFilter(ServletContainer.java:886)
>         at
> com.sun.jersey.spi.container.servlet.ServletContainer.
> doFilter(ServletContainer.java:834)
>         at
> com.sun.jersey.spi.container.servlet.ServletContainer.
> doFilter(ServletContainer.java:795)
>         at
> com.google.inject.servlet.FilterDefinition.doFilter(
> FilterDefinition.java:163)
>         at
> com.google.inject.servlet.FilterChainInvocation.doFilter(
> FilterChainInvocation.java:58)
>         at
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(
> ManagedFilterPipeline.java:118)
>         at
> com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(
> StaticUserWebFilter.java:96)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(
> CrossOriginFilter.java:95)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.security.authentication.server.
> AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
>         at
> org.apache.hadoop.security.token.delegation.web.
> DelegationTokenAuthenticationFilter.doFilter(
> DelegationTokenAuthenticationFilter.java:269)
>         at
> org.apache.hadoop.security.authentication.server.
> AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(
> HttpServer2.java:1242)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>
> There are lot of  .sst files in the level db directory.
> *sudo ls -lrt
> /media/ephemeral0/hadoop-root/yarn/timeline/leveldb-timeline-store.ldb/ |
> wc -l*
> *3848*
>
> After this error the ResourceManager and Tez ApplicationMaster are not able
> to post entities in the YARN ATS. So not able to see the history of the
> running jobs.
>
> Does anyone have any idea what is the root cause of this leveldb corruption
> and how to get rid off this issue.
>
> Thanks in advance.
>
> Regards,
> Abhishek
>
>
>
>

Re: LevelDB corruption in YARN Application TimelineServer

Posted by Jason Lowe <jl...@yahoo-inc.com.INVALID>.
Verify that something outside of Hadoop/YARN is not coming along periodically and removing "old" files (e.g.: tmpwatch, etc.).  Users have reported similar cases in the past that were tracked down to an invalid setup.  State was being corrupted by a periodic cleanup tool, like tmpwatch, removing files.
Jason
 

    On Thursday, March 2, 2017 5:59 PM, Abhishek Das <ab...@gmail.com> wrote:
 

 Hi,

I am running a hadoop 2.6.0 cluster in ec2 instances with r3.2xlarge as
instance of the master node. YARN Application TimelineServer running in the
master node is throwing an exception because of leveldb corruption. This
issue seems to be happening when the cluster has been up for a long time
(more than 7 days). The stack trace is given below.

ERROR org.apache.hadoop.yarn.server.timeline.TimelineDataManager: Skip the
timeline entity: { id: <task_id>, type: TEZ_TASK_ID }
java.lang.RuntimeException:
org.fusesource.leveldbjni.internal.NativeDB$DBException: *IO error:
/media/ephemeral0/hadoop-root/yarn/timeline/leveldb-timeline-store.ldb/330951.sst:
No such file or directory*
        at
org.fusesource.leveldbjni.internal.JniDBIterator.seek(JniDBIterator.java:68)
        at
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getEntity(LeveldbTimelineStore.java:444)
        at
org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(TimelineDataManager.java:257)
        at
org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.postEntities(TimelineWebServices.java:259)
        at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
        at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
        at
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
        at
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
        at
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
        at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
        at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
        at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
        at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
        at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
        at
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
        at
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
        at
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
        at
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
        at
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at
com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(CrossOriginFilter.java:95)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
        at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269)
        at
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1242)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at
org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)

There are lot of  .sst files in the level db directory.
*sudo ls -lrt
/media/ephemeral0/hadoop-root/yarn/timeline/leveldb-timeline-store.ldb/ |
wc -l*
*3848*

After this error the ResourceManager and Tez ApplicationMaster are not able
to post entities in the YARN ATS. So not able to see the history of the
running jobs.

Does anyone have any idea what is the root cause of this leveldb corruption
and how to get rid off this issue.

Thanks in advance.

Regards,
Abhishek