You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jian He (JIRA)" <ji...@apache.org> on 2014/01/01 00:15:50 UTC

[jira] [Commented] (MAPREDUCE-5703) Job client gets failure though RM side job execution result is FINISHED and SUCCEEDED

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859772#comment-13859772 ] 

Jian He commented on MAPREDUCE-5703:
------------------------------------

The problem is likely being that while JHS is copying the history file, but the files are not successfully written and JHS certainly doesn't have the info about the job. And the JobClient is querying the JHS about the job after it gets known from RM that the job is finished. Since the job doesn't exist in JHS, it throws NPE. But only restarting one DN shouldn't affect so much, only if this is the only DN in the cluster.

> Job client gets failure though RM side job execution result is FINISHED and SUCCEEDED
> -------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5703
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5703
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>            Reporter: Ashutosh Jindal
>
> 1) Run MR job 
> 2) After reduce completed and while JHS file writing, restart DN.
> RM side job is shown as successful.
> JHS doesnt have info about the job.
> Job client gets NPE and exit code as 255.
> java.io.IOException: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException
> 	at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269)
> 	at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173)
> 	at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:929)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2080)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2076)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2074)
> 	at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:330)
> 	at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:382)
> 	at org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:529)
> 	at org.apache.hadoop.mapreduce.Job$5.run(Job.java:668)
> 	at org.apache.hadoop.mapreduce.Job$5.run(Job.java:665)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> 	at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:665)
> 	at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1349)
> 	at org.apache.hadoop.mapred.JobClient$NetworkedJob.monitorAndPrintJob(JobClient.java:407)
> 	at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:855)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:835)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)