You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2008/04/14 18:37:04 UTC

[jira] Commented: (HADOOP-3245) Provide ability to persist running jobs (extend HADOOP-1876)

    [ https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588643#action_12588643 ] 

Hemanth Yamijala commented on HADOOP-3245:
------------------------------------------

Devaraj and I had a brief discussion today and came up with the following points:

One requirement that we felt should be addressed is: if a job is partially complete when the JobTracker restarts, the user would expect to get all information about the completed tasks of this job transparently. 

To address this requirement, we would need to persist information about every completed task. To solve this, we can probably take an approach similar to what is followed by the NameNode edit logs mechanism. We could have a master image file that stores a snapshot of the current state of a running job. When tasks of the job change state, we could store the update immediately to a log. Periodically, we could merge these updates to the master image file.

An alternative approach would be to update the image file periodically, batching updates. However in the interests of scale, and considering there may be frequent updates to tasks, we felt the earlier approach is a better one.

Another point we considered was whether we could store this information to DFS, similar to HADOOP-1876. However, given that we don't have appends, and also that the a JobTracker restart may make us lose information written to this file, we feel that may not work very well.

Comments ?

> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>
>                 Key: HADOOP-3245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3245
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>             Fix For: 0.18.0
>
>
> This could probably extend the work done in HADOOP-1876. This feature can be applied for things like jobs being able to survive jobtracker restarts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.