You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Bryan Duxbury <br...@rapleaf.com> on 2009/02/12 22:10:00 UTC

Measuring IO time in map/reduce jobs?

Hey all,

Does anyone have any experience trying to measure IO time spent in  
their map/reduce jobs? I know how to profile a sample of map and  
reduce tasks, but that appears to exclude IO time. Just subtracting  
the total cpu time from the total run time of a task seems like too  
coarse an approach.

-Bryan

Re: Measuring IO time in map/reduce jobs?

Posted by jdd dhok <jd...@gmail.com>.
Hi,
Linux kernel provides delay accounting information through a netlink
socket to user space. You can read more about it here:
http://www.mjmwired.net/kernel/Documentation/accounting/taskstats.txt.
I think there's a python tool called iotop that uses this feature.

Hope this helps.

Regards,
Jaideep


On Fri, Feb 13, 2009 at 2:40 AM, Bryan Duxbury <br...@rapleaf.com> wrote:
> Hey all,
>
> Does anyone have any experience trying to measure IO time spent in their
> map/reduce jobs? I know how to profile a sample of map and reduce tasks, but
> that appears to exclude IO time. Just subtracting the total cpu time from
> the total run time of a task seems like too coarse an approach.
>
> -Bryan
>



-- 
- JDD