You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/10/15 17:38:44 UTC
[jira] [Commented] (MAPREDUCE-5584) ShuffleHandler becomes
unresponsive during gridmix runs and can leak file descriptors
[ https://issues.apache.org/jira/browse/MAPREDUCE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795315#comment-13795315 ]
Jason Lowe commented on MAPREDUCE-5584:
---------------------------------------
The CLOSE_WAIT issue is a problem with the ShuffleHandler not closing connections under some error conditions. A sample backtrace:
{noformat}
2013-10-03 21:15:07,307 [New I/O worker #31] DEBUG org.apache.hadoop.mapred.ShuffleHandler: Ignoring closed channel error
java.nio.channels.ClosedChannelException
at org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:168)
at org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192)
at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleDownstream(ChunkedWriteHandler.java:121)
at org.jboss.netty.channel.Channels.write(Channels.java:704)
at org.jboss.netty.channel.Channels.write(Channels.java:671)
at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:612)
at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:503)
at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:485)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
{noformat}
> ShuffleHandler becomes unresponsive during gridmix runs and can leak file descriptors
> -------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5584
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5584
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: Jason Lowe
> Priority: Blocker
>
> While running gridmix on 2.3 we noticed that jobs are running much slower than normal. We tracked this down to reducers having difficulties shuffling data from maps. Details to follow.
--
This message was sent by Atlassian JIRA
(v6.1#6144)