You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/10/15 17:38:44 UTC

[jira] [Commented] (MAPREDUCE-5584) ShuffleHandler becomes unresponsive during gridmix runs and can leak file descriptors

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795315#comment-13795315 ] 

Jason Lowe commented on MAPREDUCE-5584:
---------------------------------------

The CLOSE_WAIT issue is a problem with the ShuffleHandler not closing connections under some error conditions.  A sample backtrace:

{noformat}
2013-10-03 21:15:07,307 [New I/O worker #31] DEBUG org.apache.hadoop.mapred.ShuffleHandler: Ignoring closed channel error
java.nio.channels.ClosedChannelException
        at org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:168)
        at org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192)
        at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleDownstream(ChunkedWriteHandler.java:121)
        at org.jboss.netty.channel.Channels.write(Channels.java:704)
        at org.jboss.netty.channel.Channels.write(Channels.java:671)
        at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
        at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:612)
        at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:503)
        at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
        at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:485)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
{noformat}

> ShuffleHandler becomes unresponsive during gridmix runs and can leak file descriptors
> -------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5584
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5584
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Priority: Blocker
>
> While running gridmix on 2.3 we noticed that jobs are running much slower than normal.  We tracked this down to reducers having difficulties shuffling data from maps.  Details to follow.



--
This message was sent by Atlassian JIRA
(v6.1#6144)