You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jason Fennell <jd...@gmail.com> on 2009/03/26 01:25:32 UTC

Identify the input file for a failed mapper/reducer

Is there a way to identify the input file a mapper was running on when
it failed?  When a large job fails because of bad input lines I have
to resort to rerunning the entire job to isolate a single bad line
(since the log doesn't contain information on the file that that
mapper was running on).

Basically, I would like to be able to do one of the following:
1. Find the file that a mapper was running on when it failed
2. Find the block that a mapper was running on when it failed (and be
able to find file names from block ids)

I haven't been able to find any documentation on facilities to
accomplish either (1) or (2), so I'm hoping someone on this list will
have a suggestion.

I am using the Hadoop streaming API on hadoop 0.18.2.

-Jason

Re: Identify the input file for a failed mapper/reducer

Posted by Rasit OZDAS <ra...@gmail.com>.
Two quotes for this problem:

"Streaming map tasks should have a "map_input_file" environment
variable like the following:
map_input_file=hdfs://HOST/path/to/file"

"the value for map.input.file gives you the exact information you need."

(didn't try)
Rasit

2009/3/26 Jason Fennell <jd...@gmail.com>:
> Is there a way to identify the input file a mapper was running on when
> it failed?  When a large job fails because of bad input lines I have
> to resort to rerunning the entire job to isolate a single bad line
> (since the log doesn't contain information on the file that that
> mapper was running on).
>
> Basically, I would like to be able to do one of the following:
> 1. Find the file that a mapper was running on when it failed
> 2. Find the block that a mapper was running on when it failed (and be
> able to find file names from block ids)
>
> I haven't been able to find any documentation on facilities to
> accomplish either (1) or (2), so I'm hoping someone on this list will
> have a suggestion.
>
> I am using the Hadoop streaming API on hadoop 0.18.2.
>
> -Jason
>



-- 
M. Raşit ÖZDAŞ