You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Amar Kamat (JIRA)" <ji...@apache.org> on 2008/10/06 09:47:44 UTC

[jira] Created: (HADOOP-4350) Ability to pause/resume jobs

Ability to pause/resume jobs
----------------------------

                 Key: HADOOP-4350
                 URL: https://issues.apache.org/jira/browse/HADOOP-4350
             Project: Hadoop Core
          Issue Type: New Feature
          Components: mapred
            Reporter: Amar Kamat


Consider a case where the user job depends on some external entity/service like a database or a web service. If the service needs restart or encounters a failure, the user should be able to pause the job and resume only when the service is up. This will be better than re-executing the whole job. Hence there should be some way to pause/resume jobs (from web-ui/command line) etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4350) Ability to pause/resume jobs

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637070#action_12637070 ] 

Amar Kamat commented on HADOOP-4350:
------------------------------------

HADOOP-3687 will be useful to HADOOP-4350. When a _job- pause_ is issued, we should ideally pause the tasks also instead of killing or waiting for their completion. But with scheduler in the picture, there will be lot of jobs running simultaneously and hence killing the current wave of tasks for a job should not affect the job as such.

> Ability to pause/resume jobs
> ----------------------------
>
>                 Key: HADOOP-4350
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4350
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>
> Consider a case where the user job depends on some external entity/service like a database or a web service. If the service needs restart or encounters a failure, the user should be able to pause the job and resume only when the service is up. This will be better than re-executing the whole job. Hence there should be some way to pause/resume jobs (from web-ui/command line) etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4350) Ability to pause/resume jobs

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638201#action_12638201 ] 

dhruba borthakur commented on HADOOP-4350:
------------------------------------------

We would definitely like to use the feature to pause all current jobs. Typically before the start of a scheduled HDFS maintainence window. This is, in some sense, similar to the "safemode" of HDFS.


> Ability to pause/resume jobs
> ----------------------------
>
>                 Key: HADOOP-4350
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4350
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: HADOOP-4350-v1.2.patch, HADOOP-4350-v1.3.patch
>
>
> Consider a case where the user job depends on some external entity/service like a database or a web service. If the service needs restart or encounters a failure, the user should be able to pause the job and resume only when the service is up. This will be better than re-executing the whole job. Hence there should be some way to pause/resume jobs (from web-ui/command line) etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-4350) Ability to pause/resume jobs

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat reassigned HADOOP-4350:
----------------------------------

    Assignee: Amar Kamat

> Ability to pause/resume jobs
> ----------------------------
>
>                 Key: HADOOP-4350
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4350
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>
> Consider a case where the user job depends on some external entity/service like a database or a web service. If the service needs restart or encounters a failure, the user should be able to pause the job and resume only when the service is up. This will be better than re-executing the whole job. Hence there should be some way to pause/resume jobs (from web-ui/command line) etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4350) Ability to pause/resume jobs

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-4350:
-------------------------------

    Attachment: HADOOP-4350-v1.3.patch

Fixed the race condition in {{TestJobPause}}.

> Ability to pause/resume jobs
> ----------------------------
>
>                 Key: HADOOP-4350
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4350
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: HADOOP-4350-v1.2.patch, HADOOP-4350-v1.3.patch
>
>
> Consider a case where the user job depends on some external entity/service like a database or a web service. If the service needs restart or encounters a failure, the user should be able to pause the job and resume only when the service is up. This will be better than re-executing the whole job. Hence there should be some way to pause/resume jobs (from web-ui/command line) etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4350) Ability to pause/resume jobs

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-4350:
-------------------------------

    Attachment: HADOOP-4350-v1.2.patch

Attaching a patch the gets the basic feature working.

> Ability to pause/resume jobs
> ----------------------------
>
>                 Key: HADOOP-4350
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4350
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: HADOOP-4350-v1.2.patch
>
>
> Consider a case where the user job depends on some external entity/service like a database or a web service. If the service needs restart or encounters a failure, the user should be able to pause the job and resume only when the service is up. This will be better than re-executing the whole job. Hence there should be some way to pause/resume jobs (from web-ui/command line) etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4350) Ability to pause/resume jobs

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637846#action_12637846 ] 

Devaraj Das commented on HADOOP-4350:
-------------------------------------

I agree with Owen. Offline, Amar mentioned to me that the main intention behind this issue was to support namenode bounce (in which case the service  talked about in this jira would be the namenode service). I can see that point. However, the thing to note here is that in the case of namenode being unavailable, the JT itself won't be able to do anything useful (no new jobs can be launched, new task launches trying to use the dfs would die, etc). So if we just address the problem of JT pause (where we pause all jobs) as opposed to a single job pause it should be enough.

> Ability to pause/resume jobs
> ----------------------------
>
>                 Key: HADOOP-4350
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4350
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: HADOOP-4350-v1.2.patch, HADOOP-4350-v1.3.patch
>
>
> Consider a case where the user job depends on some external entity/service like a database or a web service. If the service needs restart or encounters a failure, the user should be able to pause the job and resume only when the service is up. This will be better than re-executing the whole job. Hence there should be some way to pause/resume jobs (from web-ui/command line) etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4350) Ability to pause/resume jobs

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637658#action_12637658 ] 

Owen O'Malley commented on HADOOP-4350:
---------------------------------------

I think this is not a good direction to go. It would be much better to use the scheduler to give priority to the jobs that need it. By pausing the jobs and tasks, you'll consume resources that will block effective work by other jobs.

> Ability to pause/resume jobs
> ----------------------------
>
>                 Key: HADOOP-4350
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4350
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: HADOOP-4350-v1.2.patch, HADOOP-4350-v1.3.patch
>
>
> Consider a case where the user job depends on some external entity/service like a database or a web service. If the service needs restart or encounters a failure, the user should be able to pause the job and resume only when the service is up. This will be better than re-executing the whole job. Hence there should be some way to pause/resume jobs (from web-ui/command line) etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4350) Ability to pause/resume jobs

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637062#action_12637062 ] 

Vinod K V commented on HADOOP-4350:
-----------------------------------

HADOOP-3687 is a related issue.

> Ability to pause/resume jobs
> ----------------------------
>
>                 Key: HADOOP-4350
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4350
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Amar Kamat
>
> Consider a case where the user job depends on some external entity/service like a database or a web service. If the service needs restart or encounters a failure, the user should be able to pause the job and resume only when the service is up. This will be better than re-executing the whole job. Hence there should be some way to pause/resume jobs (from web-ui/command line) etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4350) Ability to pause/resume jobs

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637034#action_12637034 ] 

Amar Kamat commented on HADOOP-4350:
------------------------------------

We can have two knobs
1) soft/cold pause : where the scheduled tasks are allowed to run to completion and no more tasks are scheduled from the job. Job is marked to {{PAUSED}}.
2) hard/hot pause : where the scheduled tasks are killed and no more tasks are scheduled from the job. Job is marked to {{PAUSED}}.

> Ability to pause/resume jobs
> ----------------------------
>
>                 Key: HADOOP-4350
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4350
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Amar Kamat
>
> Consider a case where the user job depends on some external entity/service like a database or a web service. If the service needs restart or encounters a failure, the user should be able to pause the job and resume only when the service is up. This will be better than re-executing the whole job. Hence there should be some way to pause/resume jobs (from web-ui/command line) etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4350) Ability to pause/resume jobs

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637043#action_12637043 ] 

Amar Kamat commented on HADOOP-4350:
------------------------------------

The question one needs to answer here is for how long should we keep them in memory and how many of them?

> Ability to pause/resume jobs
> ----------------------------
>
>                 Key: HADOOP-4350
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4350
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Amar Kamat
>
> Consider a case where the user job depends on some external entity/service like a database or a web service. If the service needs restart or encounters a failure, the user should be able to pause the job and resume only when the service is up. This will be better than re-executing the whole job. Hence there should be some way to pause/resume jobs (from web-ui/command line) etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.