You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Yuri Pradkin <yu...@isi.edu> on 2008/09/26 22:17:56 UTC
Re: extracting input to a task from a (streaming) job?
I've create a jira describing my problems running under IsolationRunner.
https://issues.apache.org/jira/browse/HADOOP-4041
If anyone is using I.R. successfully to re-run failed tasks in a single JVM,
can you please, pretty please, describe on how you do that?
Thank you,
-Yuri
On Friday 08 August 2008 10:09:48 Yuri Pradkin wrote:
> On Thursday 07 August 2008 16:43:10 John Heidemann wrote:
> > On Thu, 07 Aug 2008 19:42:05 +0200, "Leon Mergen" wrote:
> > >Hello John,
> > >
> > >On Thu, Aug 7, 2008 at 6:30 PM, John Heidemann <jo...@isi.edu> wrote:
> > >> I have a large Hadoop streaming job that generally works fine,
> > >> but a few (2-4) of the ~3000 maps and reduces have problems.
> > >> To make matters worse, the problems are system-dependent (we run an a
> > >> cluster with machines of slightly different OS versions).
> > >> I'd of course like to debug these problems, but they are embedded in a
> > >> large job.
> > >>
> > >> Is there a way to extract the input given to a reducer from a job,
> > >> given the task identity? (This would also be helpful for mappers.)
> > >
> > >I believe you should set "keep.failed.tasks.files" to true -- this way,
> > > give a task id, you can see what input files it has in ~/
> > >taskTracker/${taskid}/work (source:
> > >http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html#Isolatio
> > >nR unner )
>
> IsolationRunner does not work as described in the tutorial. After the task
> hung, I failed it via the web interface. Then I went to the node that was
> running this task
>
> $ cd ...local/taskTracker/jobcache/job_200808071645_0001/work
> (this path is already different from the tutorial's)
>
> $ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:164)
>
> Looking at IsolationRunner code, I see this:
>
> 164 File workDirName = new File(lDirAlloc.getLocalPathToRead(
> 165 TaskTracker.getJobCacheSubdir()
> 166 + Path.SEPARATOR +
> taskId.getJobID() 167 + Path.SEPARATOR +
> taskId 168 + Path.SEPARATOR + "work", 169
> conf). toString());
>
> I.e. it assumes there is supposed to be a taskID subdirectory under the job
> dir, but:
> $ pwd
> ...mapred/local/taskTracker/jobcache/job_200808071645_0001
> $ ls
> jars job.xml work
>
> -- it's not there. Any suggestions?
>
> Thanks,
>
> -Yuri