You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Philip Zeyliger (JIRA)" <ji...@apache.org> on 2009/05/26 00:51:45 UTC
[jira] Updated: (HADOOP-4041) IsolationRunner does not work as
documented
[ https://issues.apache.org/jira/browse/HADOOP-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philip Zeyliger updated HADOOP-4041:
------------------------------------
Attachment: HADOOP-4041-v2.patch
Attaching a patch.
I updated Tom's previous patch a little bit to get IsolationRunner to work for map tasks. TestIsolationRunner passes. I'm still running the other tests.
I've also been testing this manually:
{noformat}
$ bin/hadoop jar build/hadoop-0.21.0-dev-examples.jar fail -D keep.failed.task.files=true -failMappers
[lots of noise]
$ bin/hadoop org.apache.hadoop.mapred.IsolationRunner /tmp/hadoop-philip-trunk/mapred/local/taskTracker/jobcache/job_200905251539_0001/attempt_200905251539_0001_m_000000_0/job.xml
09/05/25 15:41:26 INFO mapred.MapTask: io.sort.mb = 100
09/05/25 15:41:26 INFO mapred.MapTask: data buffer = 79691776/99614720
09/05/25 15:41:26 INFO mapred.MapTask: record buffer = 262144/327680
Exception in thread "main" java.lang.RuntimeException: Intentional map failure
at org.apache.hadoop.examples.FailJob$FailMapper.map(FailJob.java:53)
at org.apache.hadoop.examples.FailJob$FailMapper.map(FailJob.java:48)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:528)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:310)
at org.apache.hadoop.mapred.IsolationRunner.run(IsolationRunner.java:190)
at org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:202)
{noformat}
(The failure when re-run is what I'd expect, since the map always fails. This is much better than, say, a ClassNotFound exception of some sort, which would indicate IsolationRunner not working.)
I had to rejigger TaskRunner a bit to be able to share code for generation of the classpath. I suspect that there's still some funny business not happening for users of the DistributedCache. I haven't dug in deeply there.
I'd like to propose that we open a separate JIRA for IsolationRunner for reduce tasks. Reducers have to contact mappers to get the intermediate data, and, frankly, that's quite messy. I believe it requires interacting with the job tracker, and that seems like a lot of dependencies for a tool that in theory runs in isolation. So I'd like to get this fixed for mappers first and then tackle reducers separately.
-- Philip
> IsolationRunner does not work as documented
> -------------------------------------------
>
> Key: HADOOP-4041
> URL: https://issues.apache.org/jira/browse/HADOOP-4041
> Project: Hadoop Core
> Issue Type: Bug
> Components: documentation, mapred
> Affects Versions: 0.18.0
> Reporter: Yuri Pradkin
> Attachments: HADOOP-4041-v2.patch, hadoop-4041.patch
>
>
> IsolationRunner does not work as documented in the tutorial.
> The tutorial says "To use the IsolationRunner, first set keep.failed.tasks.files to true (also see keep.tasks.files.pattern)."
> Should be:
> keep.failed.task.files (not tasks)
> After the above was set (quoted from my message on hadoop-core):
> > After the task
> > hung, I failed it via the web interface. Then I went to the node that was
> > running this task
> >
> > $ cd ...local/taskTracker/jobcache/job_200808071645_0001/work
> > (this path is already different from the tutorial's)
> >
> > $ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml
> > Exception in thread "main" java.lang.NullPointerException
> > at
> > org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:164)
> >
> > Looking at IsolationRunner code, I see this:
> >
> > 164 File workDirName = new File(lDirAlloc.getLocalPathToRead(
> > 165 TaskTracker.getJobCacheSubdir()
> > 166 + Path.SEPARATOR + taskId.getJobID()
> > 167 + Path.SEPARATOR + taskId
> > 168 + Path.SEPARATOR + "work",
> > 169 conf). toString());
> >
> > I.e. it assumes there is supposed to be a taskID subdirectory under the job
> > dir, but:
> > $ pwd
> > ...mapred/local/taskTracker/jobcache/job_200808071645_0001
> > $ ls
> > jars job.xml work
> >
> > -- it's not there.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.