You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Antonio D'Ettole <co...@gmail.com> on 2010/03/17 12:47:11 UTC

Measuring running times

Hi everybody,
as part of my project work at school I'm running some Hadoop jobs on a
cluster. I'd like to measure exactly how long each phase of the process
takes: mapping, shuffling (ideally divided in copying and sorting) and
reducing. The tasktracker logs do not seem to supply the start/end times for
each phase, at least not all of them, even when the log level is set to
DEBUG.
Do you have any ideas on how I could work this out?
Thanks
Antonio

Re: Measuring running times

Posted by Simone Leo <si...@crs4.it>.
At the default log level, Hadoop job logs (the ones you also get in the
job's output directory under _logs/history) contain entries like the
following:

ReduceAttempt TASK_TYPE="REDUCE" TASKID="tip_200809020551_0008_r_000002"
TASK_ATTEMPT_ID="task_200809020551_0008_r_000002_0"
START_TIME="1220331166789"
HOSTNAME="tracker_foo.bar.com:localhost/127.0.0.1:44755"

ReduceAttempt TASK_TYPE="REDUCE" TASKID="tip_200809020551_0008_r_000002"
TASK_ATTEMPT_ID="task_200809020551_0008_r_000002_0"
TASK_STATUS="SUCCESS" SHUFFLE_FINISHED="1220332036001"
SORT_FINISHED="1220332036014" FINISH_TIME="1220332063254"
HOSTNAME="tracker_foo.bar.com:localhost/127.0.0.1:44755"

You get start time, shuffle finish time, sort finish time and overall
finish time. Similarly, you get start and finish time for MapAttempt
entries.

Hope this helps,

Simone

On 03/17/10 12:47, Antonio D'Ettole wrote:
> Hi everybody,
> as part of my project work at school I'm running some Hadoop jobs on a
> cluster. I'd like to measure exactly how long each phase of the process
> takes: mapping, shuffling (ideally divided in copying and sorting) and
> reducing. The tasktracker logs do not seem to supply the start/end times for
> each phase, at least not all of them, even when the log level is set to
> DEBUG.
> Do you have any ideas on how I could work this out?
> Thanks
> Antonio
> 


-- 
Simone Leo
Distributed Computing group
Advanced Computing and Communications program
CRS4
POLARIS - Building #1
Piscina Manna
I-09010 Pula (CA) - Italy
e-mail: simleo@crs4.it
http://www.crs4.it

Re: Measuring running times

Posted by Antonio D'Ettole <co...@gmail.com>.
>
> At the default log level, Hadoop job logs (the ones you also get in the
> job's output directory under _logs/history)


Thanks Simone, that's exactly what I was looking for.

Look at the job history logs. They break down the times for each task


I understand you guys are talking about the same thing? I'm using the file
in /outputDir/__logs/history . Interestingly, before you told me, I was
convinced that was actually a .jar archive so it took me a little while to
figure out where these history logs where :)

Thanks again folks!
Antonio

On Wed, Mar 17, 2010 at 4:45 PM, Owen O'Malley <om...@apache.org> wrote:

>
> On Mar 17, 2010, at 4:47 AM, Antonio D'Ettole wrote:
>
>  Hi everybody,
>> as part of my project work at school I'm running some Hadoop jobs on a
>> cluster. I'd like to measure exactly how long each phase of the process
>> takes: mapping, shuffling (ideally divided in copying and sorting) and
>> reducing.
>>
>
> Look at the job history logs. They break down the times for each task. You
> need to run a script to aggregate them. You can see an example of the
> aggregation on my petabyte sort description:
>
>
> http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html
>
> -- Owen
>

Re: Measuring running times

Posted by Owen O'Malley <om...@apache.org>.
On Mar 17, 2010, at 4:47 AM, Antonio D'Ettole wrote:

> Hi everybody,
> as part of my project work at school I'm running some Hadoop jobs on a
> cluster. I'd like to measure exactly how long each phase of the  
> process
> takes: mapping, shuffling (ideally divided in copying and sorting) and
> reducing.

Look at the job history logs. They break down the times for each task.  
You need to run a script to aggregate them. You can see an example of  
the aggregation on my petabyte sort description:

http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html

-- Owen