You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by abhishek sharma <ab...@usc.edu> on 2010/04/04 05:11:28 UTC

measuring the split reading time in Hadoop

Hi all,

I wanted to measure the time it takes to read input split for a map
task. For my cluster, I am interested in measuring the overhead of
fetching the input to a map task over the network as opposed to
reading from the local disk.

Is there an easy way to instrument some function to log this
information (say, in the TaskTracker logs)?

Thanks,
Abhishek

Re: measuring the split reading time in Hadoop

Posted by Eric Sammer <es...@cloudera.com>.
Abhishek:

It may not be entirely accurate as it incorporates additional actions
in the time, but simply looking at the task run time for local tasks
vs. non-local tasks should give you a rough estimate. Task locality
can be determined via the JT web UI as can task run times.

Hope this helps.

On Sat, Apr 3, 2010 at 8:11 PM, abhishek sharma <ab...@usc.edu> wrote:
> Hi all,
>
> I wanted to measure the time it takes to read input split for a map
> task. For my cluster, I am interested in measuring the overhead of
> fetching the input to a map task over the network as opposed to
> reading from the local disk.
>
> Is there an easy way to instrument some function to log this
> information (say, in the TaskTracker logs)?
>
> Thanks,
> Abhishek
>



-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com