You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Zac Zhou (JIRA)" <ji...@apache.org> on 2018/12/24 09:11:00 UTC
[jira] [Commented] (YARN-9155) Can't submit a submarine job, if a
previous job with the same service name has finished
[ https://issues.apache.org/jira/browse/YARN-9155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728276#comment-16728276 ]
Zac Zhou commented on YARN-9155:
--------------------------------
Could we add a parameter in Service API to indicate that the service should clean up HDFS path and zookeeper node when the job is finished? And the cleanup logic can be added in ServiceUtils.ProcessTerminationHandler.
[~leftnoteasy], [~tangzhankun], [~liuxun323] it would be nice if you can give some comments/advice.
Thanks
> Can't submit a submarine job, if a previous job with the same service name has finished
> ---------------------------------------------------------------------------------------
>
> Key: YARN-9155
> URL: https://issues.apache.org/jira/browse/YARN-9155
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Zac Zhou
> Assignee: Zac Zhou
> Priority: Major
>
> Yarn native service doesn't clean up its HDFS service path when it is finished.
> So if we don't execute "yarn app -destroy " command before the next run of a submarine job. we would get the following exception:
> 2018-12-24 11:38:02,493 ERROR org.apache.hadoop.yarn.service.utils.CoreFileSystem: Dir /user/hadoop/****/services/distributed-tf-gpu-ml4/${service_name}.json exists: hdfs://mldev/user/hadoop/******
> /services/distributed-tf-gpu-ml4/${service_name}.json 8472
> 2018-12-24 11:38:02,494 ERROR org.apache.hadoop.yarn.service.webapp.ApiServer: Failed to create service ${service_name}: {}
> java.lang.reflect.UndeclaredThrowableException
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748)
> at org.apache.hadoop.yarn.service.webapp.ApiServer.createService(ApiServer.java:131)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOu
> tInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
> at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJav
> aMethodDispatcher.java:75)
> at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
> at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
> at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:8
> 4)
> at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1
> 542)
> at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1
> 473)
> at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:14
> 19)
> at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:14
> 09)
> at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
> at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
> at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
> at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89)
> at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
> at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
> at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:1
> 79)
> at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
> at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
> at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
> at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
> at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
> at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
> at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
> at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
> at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilte
> r.java:644)
> at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilte
> r.java:592)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
> at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1610)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
> at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:539)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
> at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
> at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
> at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceC
> onsume.java:303)
> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.
> java:148)
> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.yarn.service.exceptions.SliderException: Service Instance dir already exists: /u
> ser/hadoop/********/services/distributed-tf-gpu-ml4/${service_name}.json
> at org.apache.hadoop.yarn.service.utils.CoreFileSystem.verifyDirectoryNonexistent(CoreFileSystem.java
> :260)
> at org.apache.hadoop.yarn.service.client.ServiceClient.checkAppNotExistOnHdfs(ServiceClient.java:1181
> )
> at org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:484)
> at org.apache.hadoop.yarn.service.webapp.ApiServer$2.run(ApiServer.java:137)
> at org.apache.hadoop.yarn.service.webapp.ApiServer$2.run(ApiServer.java:131)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> ... 67 more
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org