You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Amar Kamat (JIRA)" <ji...@apache.org> on 2008/05/27 17:58:04 UTC
[jira] Commented: (HADOOP-3245) Provide ability to persist running jobs (extend HADOOP-1876)

    [ https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600166#action_12600166 ] 

Amar Kamat commented on HADOOP-3245:
------------------------------------

Here is a proposal
1) A job can be one of the following states - _queued, running, completed_.
    Queued and running jobs needs to survive JT restarts and hence will be represented as _queued_ and _running_ folders under backup directory. The reason we need to do this is basically its cleaner to distinguish queued and running jobs. We need to see if we want this feature to be configurable (in terms of enabling backup mode and specifying backup location) or always check the _queued/running_ folders under *mapred.system.dir* to auto-detect if the JT got restarted. For now this backup directory will be the *mapred.system.dir*.

2) On job submission, the job first gets queued up (goes in the job structures and also gets persisted on the FS as queued). For now we will use a common structure to hold both the queued and running jobs. In future (say after the new scheduler comes in) we might need to have separate queues/lists for the same.

3) A job jumps from queued state to running state only when the {{JobInitThread}} selects it for *initialization/running* [ consider initialized/expanded job is a running job]. As of now all the jobs will transit from queued to running immediately. But in future the decision of which job to initialize will be pretty complex/involved.

4)  Running jobs need following information for restarts :
{noformat}
   4.1) TIP info : What all TIPs are there in the job and what is their locality info. 
           This could be rebuilt from job.xml which is in *mapred.system.dir*. Hence on JT restart, we should be careful 
           while clearing the mapred system directory. Currently the JobTracker will switch to RESTART mode if there 
           is some stale data in the queued/running backup folders. If we decide to keep the backup feature configurable then the
           JT will also check if its enabled.
   4.2) Completed task statuses : (Sameer's suggestion) TaskStatus info can be obtained from the TTs. 
           More details are stated below (see SYNC ALGO)
{noformat}

5) _SYNC ALGO_ : Algo for sync up the JT and TTs :
{noformat}
   5.1)  On Demand Sync : 
     Have SYNC operation for the TaskTrackers. Following are the ways to achieve on-demand sync
       5.1.1) Greedy :
           a) TT sends an old heartbeat to a new restarted JT. The JT on restart check the backup folders and detects
                if its in restart mode or not.
           b) Once the JT in restart mode receives a heartbeat which is not the *first* contact, it considers that the 
               TT is from its previous incarnation and sends a SYNC command.
           c) TT receives a SYNC operation, adds the completed statuses of running jobs to the current heartbeat 
               and SYNCs up with the JT making this contact as *initial*.
           d) JT receives the updated heartbeat as a new contact, updates the internal structures.
           e) JT replies with new tasks if asked.
     5.1.2) Waited :
           Similar to 6.1.1 but doesn't give out tasks immediately. Waits for some time and then serves out the tasks. 
          The question to answer is how much to wait? How to detect that all the TTs have SYNCed?
     For 5.1.1, the rate at which the TTs SYNCs with the JT will be faster and hence the 
     overhead should not be much. Also we could avoid scheduling tasks on SYNC operation. Thoughts?
   5.2) Other options?
{noformat}

6) Problems/Issues : 
I) Once the JT restarts, the JIP structures for previously completed jobs will be missing. Hence the web-ui will now change in terms of _completed_ jobs. Earlier the JT showed the completed jobs which on restart it will not be able to. One work around is to use _completed-job-store_ to store completed jobs and serve completed jobs from job history on restarts.

II) Consider the following scenario :
{noformat}
   1. JT schedules task1, task2 to TT1
   2. JT schedules task3, task4 to TT2
   3. TT1 informs JT about task1/task2 completion
   4. JT restarts 
   5. JT receives SYNC from TT1.
   6. JT syncs up and schedules task3 to TT1
   7. TT1 starts task3 and this might interfere with the side effect files of task3 on TT2.
   8. In the worst case task3 could be running on TT1 and JT schedules task3 on TT1 in which case the local folders will also 
       clash.
{noformat} 
    One way to overcome this is to include identifiers to distinguish between the task attempts across JT restarts. We can use JT's timestamp as an identifier.

III) The logic for _detecting_ lost TT should not rely on missing data structures but use some kind of book keeping.  We can now use 'missing data structures logic' for detecting when the TT should SYNC. Note that detecting a TT as lost (missing TT details) if different from declaring it as lost (10min gap in heartbeat).
----
So for now we should
1) Have backup as non configurable and use *mapred.system.dir* as the backup folder with _queued/running_ folders under it
2) Have queuing logic just for persistence 
3) Use job-history for serving completed jobs upon restarts
4) Change lost TT _detection_ logic
5) Use _On-Demand:Greedy_ sync logic 
6) Task attempts carry encoded JT timestamp with them
----
Thoughts?


> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>
>                 Key: HADOOP-3245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3245
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>             Fix For: 0.18.0
>
>
> This could probably extend the work done in HADOOP-1876. This feature can be applied for things like jobs being able to survive jobtracker restarts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.