You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by arun k <ar...@gmail.com> on 2011/12/03 08:09:46 UTC

Re: Capturing Map/reduce task run times and bytes read

Harsh,

Sorry for creating confusion.
The question is if i have a single node setup and i give Sysout statements
in maptask.java and reducetask.java.
{HADOOP_HOME}$ant build
{HADOOP_HOME}$start all daemons
{HADOOP_HOME}$ run wordcount example

Yes i am able to see o/p in *.out files of tasktrackers.

Q>Does the map/reduce task run time displayed in web GUI is decent/accurate
enough ?
Q>If i want to do find the IO rate of a task, will the task run time
divided by total number of FIle bytes and HDFS bytes read/written give it
approximately ?
Q>Does the FILE Bytes read for the reduce task include the map output
record bytes read non-locally over network or the bytes read locally from
the map output records after they are copied locally ?

Thanks,
Arun

Re: Capturing Map/reduce task run times and bytes read

Posted by arun k <ar...@gmail.com>.
Harsh,

I wanted to conform about it b'coz in case if it doesn't i want to write
code to capture it.

Does it make sense to classify a map/reduce task as I/O bound or cpu bound
based on its I/O rate ?

Arun

On Sat, Dec 3, 2011 at 2:43 PM, Harsh J <ha...@cloudera.com> wrote:

> Arun,
>
> Inline again.
>
> On 03-Dec-2011, at 12:39 PM, arun k wrote:
>
>
> Q>Does the map/reduce task run time displayed in web GUI is
> decent/accurate enough ?
>
>
> Don't see why not. We only display what's been genuinely collected. What
> you get out of an API on the CLI is absolutely the same thing. Or perhaps I
> do not understand your question completely here - what's led you to ask
> this?
>
> Q>If i want to do find the IO rate of a task, will the task run time
> divided by total number of FIle bytes and HDFS bytes read/written give it
> approximately ?
>
>
> Yes, that should give you a stop-watch measure. Task start -> Task end,
> and the counters the task puts up for itself.
>
> Q>Does the FILE Bytes read for the reduce task include the map output
> record bytes read non-locally over network or the bytes read locally from
> the map output records after they are copied locally ?
>
>
> FILE counters are from whatever is read off a local filesystem (file:///),
> so would mean the latter. If you look again, you will notice another
> counter named "Reduce shuffle bytes" that gives you the former count -
> separately.
>

Re: Capturing Map/reduce task run times and bytes read

Posted by Harsh J <ha...@cloudera.com>.
Arun,

Inline again.

On 03-Dec-2011, at 12:39 PM, arun k wrote:
> 
> Q>Does the map/reduce task run time displayed in web GUI is decent/accurate enough ?

Don't see why not. We only display what's been genuinely collected. What you get out of an API on the CLI is absolutely the same thing. Or perhaps I do not understand your question completely here - what's led you to ask this?

> Q>If i want to do find the IO rate of a task, will the task run time divided by total number of FIle bytes and HDFS bytes read/written give it approximately ?

Yes, that should give you a stop-watch measure. Task start -> Task end, and the counters the task puts up for itself.

> Q>Does the FILE Bytes read for the reduce task include the map output record bytes read non-locally over network or the bytes read locally from the map output records after they are copied locally ?

FILE counters are from whatever is read off a local filesystem (file:///), so would mean the latter. If you look again, you will notice another counter named "Reduce shuffle bytes" that gives you the former count - separately.