You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Vasilis Liaskovitis <vl...@gmail.com> on 2009/09/10 19:09:36 UTC

duplicate START_TIME, FINISH_TIME timestamps in history log

Hi,

I am getting different values for START_TIME, FINISH_TIME regarding
the exact same task when looking at the history log of a sorter job.
 E.g greping for a particular reduce task in the history log:

masternode:/home/hadoop-git # grep -r
"task_200909031613_0002_r_000002"
/home/vliaskov/hadoop-git/logs/history/doomhead.amd.com_1252012387435_job_200909031613_0002_root_sorter

Task TASKID="task_200909031613_0002_r_000002" TASK_TYPE="REDUCE"
START_TIME="1252014285633" SPLITS="" .
ReduceAttempt TASK_TYPE="REDUCE"
TASKID="task_200909031613_0002_r_000002"
TASK_ATTEMPT_ID="attempt_200909031613_0002_r_000002_0"
START_TIME="1252012959359"
TRACKER_NAME="tracker_compute-5\.doomcluster\.amd\.com:localhost/127\.0\.0\.1:37388"
HTTP_PORT="50060" .
ReduceAttempt TASK_TYPE="REDUCE"
TASKID="task_200909031613_0002_r_000002"
TASK_ATTEMPT_ID="attempt_200909031613_0002_r_000002_0"
TASK_STATUS="SUCCESS" SHUFFLE_FINISHED="1252014856244"
SORT_FINISHED="1252014856723" FINISH_TIME="1252015219858"
Task TASKID="task_200909031613_0002_r_000002" TASK_TYPE="REDUCE"
TASK_STATUS="SUCCESS" FINISH_TIME="1252016563501" ...

>From the above you can see that e.g. START_TIME is reported twice and
is different.
START_TIME= 1252014285633
START_TIME= 1252012959359

Same goes for FINISH_TIME and possibly other timestamped events. Does
the history log gather messages from several tasktrackers and from the
jobtracker regarding one particular task? That might explain the log.

Could this be a problem because the cluster nodes are not synchronized
to a master node using e.g. NTP?
On that note, do people use ntpd and ntpq/ntpc to synchronize their
slave/tasktracker nodes' clocks to their master/jobtracker node clock?

thanks for any help,

- Vasilis

Re: duplicate START_TIME, FINISH_TIME timestamps in history log

Posted by Amar Kamat <am...@yahoo-inc.com>.
Vasilis Liaskovitis wrote:
> Hi,
>
> I am getting different values for START_TIME, FINISH_TIME regarding
> the exact same task when looking at the history log of a sorter job.
>  E.g greping for a particular reduce task in the history log:
>
> masternode:/home/hadoop-git # grep -r
> "task_200909031613_0002_r_000002"
> /home/vliaskov/hadoop-git/logs/history/doomhead.amd.com_1252012387435_job_200909031613_0002_root_sorter
>
> Task TASKID="task_200909031613_0002_r_000002" TASK_TYPE="REDUCE"
> START_TIME="1252014285633" SPLITS="" .
>   
this is line#1
> ReduceAttempt TASK_TYPE="REDUCE"
> TASKID="task_200909031613_0002_r_000002"
> TASK_ATTEMPT_ID="attempt_200909031613_0002_r_000002_0"
> START_TIME="1252012959359"
> TRACKER_NAME="tracker_compute-5\.doomcluster\.amd\.com:localhost/127\.0\.0\.1:37388"
> HTTP_PORT="50060" .
>   
The start time is the tip start time (time when the task is scheduled at 
JobTracker)
this is line#2
> ReduceAttempt TASK_TYPE="REDUCE"
> TASKID="task_200909031613_0002_r_000002"
> TASK_ATTEMPT_ID="attempt_200909031613_0002_r_000002_0"
> TASK_STATUS="SUCCESS" SHUFFLE_FINISHED="1252014856244"
> SORT_FINISHED="1252014856723" FINISH_TIME="1252015219858"
>   
The start time here is the attempt start time which is the time when the 
attempt started at the tracker.
this is line#3
> Task TASKID="task_200909031613_0002_r_000002" TASK_TYPE="REDUCE"
> TASK_STATUS="SUCCESS" FINISH_TIME="1252016563501" ...
>   
This is line#4
Amar
> From the above you can see that e.g. START_TIME is reported twice and
> is different.
> START_TIME= 1252014285633
> START_TIME= 1252012959359
>
> Same goes for FINISH_TIME and possibly other timestamped events. Does
> the history log gather messages from several tasktrackers and from the
> jobtracker regarding one particular task? That might explain the log.
>
> Could this be a problem because the cluster nodes are not synchronized
> to a master node using e.g. NTP?
> On that note, do people use ntpd and ntpq/ntpc to synchronize their
> slave/tasktracker nodes' clocks to their master/jobtracker node clock?
>
> thanks for any help,
>
> - Vasilis
>