You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/11/18 20:22:00 UTC

[jira] [Commented] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636021#comment-17636021 ] 

ASF GitHub Bot commented on MAPREDUCE-7026:
-------------------------------------------

ashutoshcipher opened a new pull request, #5150:
URL: https://github.com/apache/hadoop/pull/5150

   ### Description of PR
   
   Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
   
   JIRA - MAPREDUCE-7026
   
   ### How was this patch tested?
   
   Verified by looking at changed logs.
   
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
   
   




> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> ------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 2.7.3
>            Reporter: Prabhu Joseph
>            Assignee: Ashutosh Gupta
>            Priority: Major
>              Labels: supportability
>         Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch, MAPREDUCE-7026.3.patch
>
>
> A job is failing with reduce tasks failed to fetch map output and the NodeManager ShuffleHandler failed to serve the map outputs with some IOException like below. ShuffleHandler sends the actual error message in response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating headers :
> java.io.IOException: Error Reading IndexFile
>         at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
>         at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
>         at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
>         at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
>         at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
>         at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
>         at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
>         at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>         at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
>         at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
>         at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
>         at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
>         at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
>         at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>         at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>         at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>         at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
>         at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
>         at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
>         at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>         at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>         at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Owner 'hbase' for path /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_000007_0_10004/file.out.index did not match expected owner 'bde'
>         at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
>         at org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174)
>         at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158)
>         at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:70)
>         at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62)
>         at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
> {code}
> Fetcher Logs below without the actual error message:
> {code}
> 2018-04-19 12:24:39,521 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>         at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:366)
>         at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:288)
>         at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:369)
>         at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:200)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org