You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Wei-Chiu Chuang (Jira)" <ji...@apache.org> on 2021/05/13 04:23:00 UTC
[jira] [Updated] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

     [ https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wei-Chiu Chuang updated YARN-10324:
-----------------------------------
    Target Version/s: 3.3.2, 2.7.8  (was: 2.7.8, 3.3.1)

> Fetch data from NodeManager may case read timeout when disk is busy
> -------------------------------------------------------------------
>
>                 Key: YARN-10324
>                 URL: https://issues.apache.org/jira/browse/YARN-10324
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: auxservices
>    Affects Versions: 2.7.0, 3.2.1
>            Reporter: Yao Guangdong
>            Priority: Minor
>              Labels: patch
>         Attachments: YARN-10324.001.patch, YARN-10324.002.patch
>
>
>  With the cluster size become more and more big.The cost  time on Reduce fetch Map's result from NodeManager become more and more long.We often see the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections became very high when the read timeout happened.We analyze that if we have 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million times IO stream operate in the shuffle phase.If the reduce fetch data size is very small from map output files.Which make the disk IO util become very high in big cluster.Then read timeout happened frequently.The application finished time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we want to change the small IO to big IO which can reduce the small disk IO times. So we try to cache all the small file data(file.out) in memory when the first fetch request come.Then the others fetch request only need read data from memory avoid disk IO operation.After we cache data to memory we find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org