You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/05/01 19:43:47 UTC

[jira] Updated: (HADOOP-164) Getting errors in reading the output files of a map/reduce job immediately after the job is complete

     [ http://issues.apache.org/jira/browse/HADOOP-164?page=all ]

Doug Cutting updated HADOOP-164:
--------------------------------

    Component: mapred

> Getting errors in reading the output files of a map/reduce job immediately after the job is complete
> ----------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-164
>          URL: http://issues.apache.org/jira/browse/HADOOP-164
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Reporter: Runping Qi

>
> I have an app that fire up map/reduce jobs sequentially. The output of one job if the input of the next.
> I observe that many map tasks failed due to file read errors:
> java.rmi.RemoteException: java.io.IOException: Cannot open filename /user/runping/runping/docs_store/stage_2/base_docs/part-00186 at org.apache.hadoop.dfs.NameNode.open(NameNode.java:130) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216) at org.apache.hadoop.ipc.Client.call(Client.java:303) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141) at org.apache.hadoop.dfs.$Proxy1.open(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:315) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.(DFSClient.java:302) at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:95) at org.apache.hadoop.dfs.DistributedFileSystem.openRaw(DistributedFileSystem.java:78) at org.apache.hadoop.fs.FSDataInputStream$Checker.(FSDataInputStream.java:46) at org.apache.hadoop.fs.FSDataInputStream.(FSDataInputStream.java:220) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:146) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:234) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:226) at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:36) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:53) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:105) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:709) 
> Those tasks succeeded in the second or third try.
> After interting 10 seconds sleep between consecutive jobs, the problem disappear.
> Here is my code to detect whether a job is completed:
>       try {
>         running = jc.submitJob(job);
>         String jobId = running.getJobID();
>         System.out.println("start job:\t" + jobId);
>         while (!running.isComplete()) {
>           try {
>             Thread.sleep(1000);
>           } catch (InterruptedException e) {}
>           running = jc.getJob(jobId);
>         }
>         sucess = running.isSuccessful();
>       } finally {
>         if (!sucess && (running != null)) {
>           running.killJob();
>         }
>         jc.close();
>       }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira