You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Devaraj Das (JIRA)" <ji...@apache.org> on 2008/07/08 14:04:31 UTC

[jira] Commented: (HADOOP-3245) Provide ability to persist running jobs (extend HADOOP-1876)

    [ https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611539#action_12611539 ] 

Devaraj Das commented on HADOOP-3245:
-------------------------------------

Some initial comments:
1) Remove the unnecessary comments from JobTracker.java
2) Rename the "restarted" field as "recovering"
3) hasJobTrackerRestarted/Recovered API
4) Remove the comment: "//TODO wait for all the incomplete(previously running) jobs to be ready" from offerService
5) Put back the call to completedJobStatusStore.store in finalizeJob
6) The method cleanupJob seems unnecessary. What is already done w.r.t cleanup will continue to work.
7) The implementation of wasRecovered and hasRecovered should not make a back call to the JobTracker
8) Synchronization for tasksInited in initTasks is redundant. Do a notify instead of notifyAll in the following line.
9) In the interval between the JT death and restart the reducers might fail to fetch map outputs from some tasktrackers (due to faulty map nodes, etc.), but it has no one to send the notifications to. The reducers might end up killing themselves after a couple of retries.
10) The construction of TaskTrackerStatus should be reverted to how it was done earlier (cloneAndResetRunningTaskStatuses called inline with the constructor invocation)
11) In TaskTracker.transmitHeartBeat you should call cloneAndResetRunningJobTaskStatuses rather than cloneAndResetRunningTaskStatuses
12) Pls move the SYNC action handling to the offerService method
13) shouldResetEventsIndex could be cleared upon the first access as opposed to doing it in the heartbeat processing
14) Instead of the additional RPC in Umbilical, you can add an arg in the getMapCompletionEvents to know whether to reset or not
15) Factor out common code from cloneAndResetRunningJobTaskStatuses/cloneAndResetRunningTaskStatuses


> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>
>                 Key: HADOOP-3245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3245
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3245-v2.5.patch, HADOOP-3245-v2.6.5.patch, HADOOP-3245-v2.6.9.patch
>
>
> This could probably extend the work done in HADOOP-1876. This feature can be applied for things like jobs being able to survive jobtracker restarts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.