You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2017/09/01 12:19:00 UTC

[jira] [Commented] (MAPREDUCE-6950) Error Launching job : java.io.IOException: Unknown Job job_xxx_xxx

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150422#comment-16150422 ] 

Jason Lowe commented on MAPREDUCE-6950:
---------------------------------------

Retries of the write are automatically performed by the HDFS client layer before ultimately giving up and bubbling the error up to the application layer.  Looking back in the AM logs you should be able to find indications of this.  Since the HDFS layer is already retrying, the utility of retrying again at the application layer is questionable.


> Error Launching job : java.io.IOException: Unknown Job job_xxx_xxx
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6950
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6950
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>    Affects Versions: 2.7.1
>            Reporter: zhengchenyu
>             Fix For: 2.7.5
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> some job report error, like this:
> {code}
> hadoop.mapreduce.Job.monitorAndPrintJob(Job.java 1367) [main] :  map 100% reduce 100%
> [2017-08-31T20:27:12.591+08:00] [INFO] hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277) [main] : Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
> [2017-08-31T20:27:12.821+08:00] [INFO] hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277) [main] : Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
> [2017-08-31T20:27:13.039+08:00] [INFO] hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277) [main] : Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
> [2017-08-31T20:27:13.256+08:00] [ERROR] hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java 1034) [main] : Error Launching job : java.io.IOException: Unknown Job job_xxx_xxx
> {code}
> I found the am container log, like below. Here we know error happened in pipeline, maybe some dn error. And I also found some other reason which close the JobHistoryEventHandler. So MR AM can't write the information for JH. So client counldn't know whether the appplication is finished. 
> {code}
> 2017-08-31 20:27:10,813 INFO [Thread-1968] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, writing event MAP_ATTEMPT_STARTED
> 2017-08-31 20:27:10,814 ERROR [Thread-1968] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error writing History Event: org.apache.hadoop.mapreduce.jobhistory.TaskAttemptStartedEvent@2055ea0a
> java.io.EOFException: Premature EOF: no length prefix available
>         at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2292)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1317)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
> 2017-08-31 20:27:10,814 INFO [Thread-1968] org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler failed in state STOPPED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.EOFException: Premature EOF: no length prefix available
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.EOFException: Premature EOF: no length prefix available
>         at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:580)
>         at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:374) 
>         at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>         at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>         at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>         at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>         at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> {code}
> This problem is serious , especially for hive. Job must rerun meaninglessly!  So I think we need to retry the operation of writing history event. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org