You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Peter Bacsko (JIRA)" <ji...@apache.org> on 2018/11/05 13:41:00 UTC

[jira] [Comment Edited] (MAPREDUCE-7156) NullPointerException when reaching max shuffle connections

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675152#comment-16675152 ] 

Peter Bacsko edited comment on MAPREDUCE-7156 at 11/5/18 1:40 PM:
------------------------------------------------------------------

I haven't modified the existing tests or added new ones.

The reason being is that Netty uses thread pools and the behaviour is unpredictable. I modified {{TestShuffleHandler.testMaxConnections()}} but even with the old code sometimes it passes (with {{connAttempts}} = 30). It cannot be reproduced with 100% accuracy. However, I re-ran it multiple times after this change and there were no NPEs at all.


was (Author: pbacsko):
I haven't modified the existing tests or added new ones.

The reason being is that Netty uses thread pools and the behaviour is unpredictable. I modified {{TestShuffleHandler.testMaxConnections()}} but even with the old code sometimes it passes (with {{connAttempts}} = 30).. It cannot be reproduced with 100% accuracy. However, I re-ran it multiple times after this change and there were no NPEs at all.

> NullPointerException when reaching max shuffle connections
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-7156
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7156
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.9.1, 3.1.1
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: MAPREDUCE-7156-001.patch
>
>
>  When you hit the max number of shuffle connections, you can get a lot of NullPointerExceptions from Netty:
> {noformat}
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
> 2018-07-17 10:47:36,312 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
> 2018-07-17 10:47:36,316 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,329 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Skipping monitoring container container_e22_1531424278071_55040_01_002295 since CPU usage is not yet available.
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,361 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
> 2018-07-17 10:47:36,390 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
> 2018-07-17 10:47:36,395 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
> {noformat}
> {noformat}
> 2018-07-17 13:58:28,263 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
> 2018-07-17 13:58:28,264 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
> java.lang.NullPointerException
>         at org.jboss.netty.handler.timeout.IdleStateHandler.writeComplete(IdleStateHandler.java:302)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>         at org.jboss.netty.channel.Channels.fireWriteComplete(Channels.java:324)
>         at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:299)
>         at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:146)
>         at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:99)
>         at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)
>         at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
>         at org.jboss.netty.channel.Channels.write(Channels.java:725)
>         at org.jboss.netty.channel.Channels.write(Channels.java:686)
>         at org.jboss.netty.handler.ssl.SslHandler.wrapNonAppData(SslHandler.java:1110)
>         at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1252)
>         at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
>         at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
>         at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>         at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>         at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>         at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>         at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>         at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>         at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>         at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>         at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>         at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Solutions seems to be an one-liner: you have to call {{super.channelOpen(ctx, evt);}} in {{Shuffle.channelOpen()}} in both cases. If we don't do this, then {{IdleStateHandler}} will not be initialized properly and will get a null attachment object when executing {{writeComplete()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org