You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Bejoy Ks <be...@gmail.com> on 2012/01/16 14:16:18 UTC

Identify splits processed by each mapper

Hi Experts
      A quick question. I have quite a few map reduce jobs running on my
cluster. One job's input itself has a large number of files, I'd like to
know which split was processed by each map task without doing any custom
logging (for successful, falied & killed tasks) . I tried digging into the
job tracker web UI but I just got a pointer as input split location which
specifies the nodes in which it is located, but what I'm looking for is the
file name  and which split of that file.

Where can I find this information ?
Is it available or can I make it available in in jobdetails.jsp?
Do I need to enable some configuration parameter to display the same?
Is it possible only by custom logging and don't hadoop framework provide
the same?


Thank You

Regards
Bejoy.KS

Re: Identify splits processed by each mapper

Posted by Bejoy Ks <be...@gmail.com>.
Thanks Harsh

I wanted to utilize if this feature was already available in map reduce
before going in for a custom logging. I may have to go with custom logging
for the moment.

I have filed a JIRA for the same. Please review and update if it require
more details.
https://issues.apache.org/jira/browse/MAPREDUCE-3678

Regards
Bejoy.K.S

On Mon, Jan 16, 2012 at 10:42 PM, Harsh J <ha...@cloudera.com> wrote:

> Bejoy,
>
> On 16-Jan-2012, at 6:46 PM, Bejoy Ks wrote:
>
>       A quick question. I have quite a few map reduce jobs running on my
> cluster. One job's input itself has a large number of files, I'd like to
> know which split was processed by each map task without doing any custom
> logging (for successful, falied & killed tasks) . I tried digging into the
> job tracker web UI but I just got a pointer as input split location which
> specifies the nodes in which it is located, but what I'm looking for is the
> file name  and which split of that file.
>
>
> Initially the status (via reporter) of a task is set to the FileSplit's
> path plus offset and length, but that's all.
>
> Where can I find this information ?
>
>
> Unfortunately, none of this is logged by default. Please file a JIRA to
> have it added/discuss how to add this (do follow up this thread with the ID)
>
> Is it available or can I make it available in in jobdetails.jsp?
>
>
> No, but you can write a short utility program that emulates the splitter
> and prints the mapping with that.
>
> Do I need to enable some configuration parameter to display the same?
>
>
> No, as far as I know there is none.
>
> Is it possible only by custom logging and don't hadoop framework provide
> the same?
>
>
> Framework does not provide this, so custom logging is the easiest way if
> it is possible.
>
> --
> Harsh J
> Customer Ops. Engineer, Cloudera
>
>

Re: Identify splits processed by each mapper

Posted by Harsh J <ha...@cloudera.com>.
Bejoy,

On 16-Jan-2012, at 6:46 PM, Bejoy Ks wrote:
>       A quick question. I have quite a few map reduce jobs running on my cluster. One job's input itself has a large number of files, I'd like to know which split was processed by each map task without doing any custom logging (for successful, falied & killed tasks) . I tried digging into the job tracker web UI but I just got a pointer as input split location which specifies the nodes in which it is located, but what I'm looking for is the file name  and which split of that file.

Initially the status (via reporter) of a task is set to the FileSplit's path plus offset and length, but that's all.

> Where can I find this information ? 

Unfortunately, none of this is logged by default. Please file a JIRA to have it added/discuss how to add this (do follow up this thread with the ID)

> Is it available or can I make it available in in jobdetails.jsp? 

No, but you can write a short utility program that emulates the splitter and prints the mapping with that.

> Do I need to enable some configuration parameter to display the same?

No, as far as I know there is none.

> Is it possible only by custom logging and don't hadoop framework provide the same?

Framework does not provide this, so custom logging is the easiest way if it is possible.

--
Harsh J
Customer Ops. Engineer, Cloudera