You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2015/06/11 07:14:00 UTC

[jira] [Created] (TEZ-2550) DAGAppMaster gets locked up due to ATS

Rajesh Balamohan created TEZ-2550:
-------------------------------------

             Summary: DAGAppMaster gets locked up due to ATS
                 Key: TEZ-2550
                 URL: https://issues.apache.org/jira/browse/TEZ-2550
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Rajesh Balamohan



{noformat}

Thread 30453: (state = IN_NATIVE)
 - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
 - java.net.SocketInputStream.read(byte[], int, int, int) @bci=79, line=150 (Compiled frame)
 - java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=121 (Compiled frame)
 - java.io.BufferedInputStream.fill() @bci=214, line=246 (Compiled frame)
 - java.io.BufferedInputStream.read1(byte[], int, int) @bci=44, line=286 (Compiled frame)
 - java.io.BufferedInputStream.read(byte[], int, int) @bci=49, line=345 (Compiled frame)
 - sun.net.www.http.HttpClient.parseHTTPHeader(sun.net.www.MessageHeader, sun.net.ProgressSource, sun.net.www.protocol.http.HttpURLConnection) @bci=51, line=703 (Compiled frame)
 - sun.net.www.http.HttpClient.parseHTTP(sun.net.www.MessageHeader, sun.net.ProgressSource, sun.net.www.protocol.http.HttpURLConnection) @bci=56, line=647 (Compiled frame)
 - sun.net.www.protocol.http.HttpURLConnection.getInputStream0() @bci=327, line=1534 (Compiled frame)
 - sun.net.www.protocol.http.HttpURLConnection.getInputStream() @bci=52, line=1439 (Compiled frame)
 - java.net.HttpURLConnection.getResponseCode() @bci=16, line=480 (Compiled frame)
 - com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(com.sun.jersey.api.client.ClientRequest) @bci=272, line=240 (Interpreted frame)
 - com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(com.sun.jersey.api.client.ClientRequest) @bci=2, line=147 (Interpreted frame)
 - org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run() @bci=11, line=226 (Interpreted frame)
 - org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientRetryOp) @bci=11, line=162 (Interpreted frame)
 - org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(com.sun.jersey.api.client.ClientRequest) @bci=18, line=237 (Interpreted frame)
 - com.sun.jersey.api.client.Client.handle(com.sun.jersey.api.client.ClientRequest) @bci=35, line=648 (Interpreted frame)
 - com.sun.jersey.api.client.WebResource.handle(java.lang.Class, com.sun.jersey.api.client.ClientRequest) @bci=10, line=670 (Interpreted frame)
 - com.sun.jersey.api.client.WebResource.access$200(com.sun.jersey.api.client.WebResource, java.lang.Class, com.sun.jersey.api.client.ClientRequest) @bci=3, line=74 (Compiled frame)
 - com.sun.jersey.api.client.WebResource$Builder.post(java.lang.Class, java.lang.Object) @bci=12, line=563 (Compiled frame)
 - org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(java.lang.Object, java.lang.String) @bci=41, line=472 (Compiled frame)
 - org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(java.lang.Object, java.lang.String) @bci=3, line=321 (Compiled frame)
 - org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(org.apache.hadoop.yarn.api.records.timeline.TimelineEntity[]) @bci=55, line=301 (Compiled frame)
 - org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(java.util.List) @bci=188, line=343 (Compiled frame)
 - org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.serviceStop() @bci=273, line=229 (Interpreted frame)
 - org.apache.hadoop.service.AbstractService.stop() @bci=32, line=221 (Interpreted frame)
 - org.apache.hadoop.service.ServiceOperations.stop(org.apache.hadoop.service.Service) @bci=5, line=52 (Interpreted frame)
 - org.apache.hadoop.service.ServiceOperations.stopQuietly(org.apache.commons.logging.Log, org.apache.hadoop.service.Service) @bci=1, line=80 (Interpreted frame)
 - org.apache.hadoop.service.CompositeService.stop(int, boolean) @bci=115, line=157 (Interpreted frame)
 - org.apache.hadoop.service.CompositeService.serviceStop() @bci=58, line=131 (Interpreted frame)
 - org.apache.tez.dag.history.HistoryEventHandler.serviceStop() @bci=11, line=80 (Interpreted frame)
 - org.apache.hadoop.service.AbstractService.stop() @bci=32, line=221 (Interpreted frame)
 - org.apache.hadoop.service.ServiceOperations.stop(org.apache.hadoop.service.Service) @bci=5, line=52 (Interpreted frame)
 - org.apache.hadoop.service.ServiceOperations.stopQuietly(org.apache.commons.logging.Log, org.apache.hadoop.service.Service) @bci=1, line=80 (Interpreted frame)
 - org.apache.hadoop.service.ServiceOperations.stopQuietly(org.apache.hadoop.service.Service) @bci=4, line=65 (Interpreted frame)
 - org.apache.tez.dag.app.DAGAppMaster.stopServices() @bci=137, line=1675 (Interpreted frame)
 - org.apache.tez.dag.app.DAGAppMaster.serviceStop() @bci=30, line=1831 (Interpreted frame)
 - org.apache.hadoop.service.AbstractService.stop() @bci=32, line=221 (Interpreted frame)
 - org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run() @bci=48, line=840 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

.....
.....
.....
.....


Thread 26211: (state = BLOCKED)
 - org.apache.tez.dag.app.DAGAppMaster.shutdownTezAM() @bci=0, line=1176 (Interpreted frame)
 - org.apache.tez.dag.api.client.DAGClientHandler.shutdownAM() @bci=22, line=124 (Interpreted frame)
 - org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.shutdownSession(com.google.protobuf.RpcController, org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$ShutdownSessionRequestProto) @bci=55, line=179 (Interpreted frame)
 - org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor, com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=152, line=7473 (Compiled frame)
 - org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(org.apache.hadoop.ipc.RPC$Server, java.lang.String, org.apache.hadoop.io.Writable, long) @bci=246, line=619 (Compiled frame)
 - org.apache.hadoop.ipc.RPC$Server.call(org.apache.hadoop.ipc.RPC$RpcKind, java.lang.String, org.apache.hadoop.io.Writable, long) @bci=9, line=962 (Compiled frame)
 - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=38, line=2039 (Compiled frame)
 - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=2035 (Compiled frame)
 - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Compiled frame)
 - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
 - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1628 (Compiled frame)
 - org.apache.hadoop.ipc.Server$Handler.run() @bci=308, line=2033 (Interpreted frame)
{noformat}

DAGAppMaster.serviceStop() gets a lock which is not released due to ATS connection (thought socket read timeout would be there; but this never comes out of the blocking call).  Due to this shutdownTezAM() gets blocked and ends up occupying the slot.

This happened with latest tez master (commit ce26b3f52761d2a48a612a7613d99b712a320204).  Not sure if this is consistently reproduceable; Creating this ticket as a placeholder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)