You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Prasanth Jayachandran (Jira)" <ji...@apache.org> on 2020/05/09 02:00:00 UTC

[jira] [Commented] (TEZ-4174) [Kubernetes] Fetcher should connection failure on SocketException

    [ https://issues.apache.org/jira/browse/TEZ-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103055#comment-17103055 ] 

Prasanth Jayachandran commented on TEZ-4174:
--------------------------------------------

cc/ [~rajesh.balamohan] [~gopalv]

> [Kubernetes] Fetcher should connection failure on SocketException
> -----------------------------------------------------------------
>
>                 Key: TEZ-4174
>                 URL: https://issues.apache.org/jira/browse/TEZ-4174
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Major
>
> Fetcher considers connection failure only when http.connect throws exception. In kubernetes environment, where there can be intermediate proxies, getInputStream from http connection can throw connection reset error (5xx). These errors should be considered as connection failures as well.
> {code:java}
> 2020-05-08 17:03:54.080  WARN [Fetcher_B {Map_3} #3] shuffle.Fetcher: Fetch Failure while connecting from 10.117.155.27 to: 10.117.154.115:25551, attempt: InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, pathComponent=attempt_1588982534035_0000_1_00_000000_0_10030, spillType=0, spillId=-1] Informing ShuffleManager:
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:210)
>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>         at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
>         at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
>         at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:706)
>         at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593)
>         at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
>         at org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:530)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:563)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:487)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:285)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>         at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>         at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)