You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Haibo Chen (JIRA)" <ji...@apache.org> on 2017/02/22 23:10:44 UTC

[jira] [Resolved] (MAPREDUCE-6437) Add retry on some connection exception on job commit phase

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Haibo Chen resolved MAPREDUCE-6437.
-----------------------------------
    Resolution: Duplicate

Closing this as a duplicate of MAPREDUCE-5485

> Add retry on some connection exception on job commit phase
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-6437
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6437
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Bing Jiang
>
> {code}
> Job commit failed: java.net.ConnectException: Call From TS-DN-167/172.22.5.167 to SHYF-H11-BH03:52310 failed on connection exception: java.net.ConnectException: Connection timed out; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
> at org.apache.hadoop.ipc.Client.call(Client.java:1415)
> at org.apache.hadoop.ipc.Client.call(Client.java:1364)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy14.create(Unknown Source)
> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy15.create(Unknown Source)
> at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1645)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1627)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1552)
> at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:396)
> at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:392)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:392)
> at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:336)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
> at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.touchz(CommitterEventHandler.java:244)
> at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:250)
> at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:216)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection timed out
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
> at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
> at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:606)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:700)
> at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463)
> at org.apache.hadoop.ipc.Client.call(Client.java:1382)
> ... 28 more
> {code}
> Check the code, there is no chance to make another application master attempt if it encounters the issue of connection. So could we identify the exception, and make another retry or kick off another AM attempt?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org