You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Karam Singh (JIRA)" <ji...@apache.org> on 2009/05/08 15:19:45 UTC

[jira] Commented: (HADOOP-5794) Sometimes job does not get removed from scheduler queue after it is killed

    [ https://issues.apache.org/jira/browse/HADOOP-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707336#action_12707336 ] 

Karam Singh commented on HADOOP-5794:
-------------------------------------

Cluster setup -  : 
Cluster Capacity = 204 maps, 204 reduces
4 queues 
Q1 Capacity Percent= 40
Q2 Capacity Percent= 40
Q3 Capacity Percent= 40
Q4 Capacity Percent= 40

Each queue has user limit=100%
Submitted 8 jobs to each queue. Total 32 sleep jobs were submitted with each job having maps=10000 (sleep time 5 secs), reduce=2 (sleep time 1 min).
All jobs were initialized. Out which maps of 4 maps started running. When at least 1000 maps of each job completed, re-started JobTracker.
After recovery of JobTracker, waited up to the time when 4 jobs got completed. Killed all remaining 28 jobs.
All jobs got killed successfully.
JobTracker webui displayed all killed jobs under failed jobs list. hadoop job -list all also displays the status of 28 killed job as 5.
While browsing through jobqueue_details.jsp pages of queues found that 2 jobs which were killed have not been removed from queue of capacity scheduler. Maps of both jobs were running before kill was sent to them.
To check that cluster should be blocked because of this, submitted 3 more jobs to each queue where 2 killed were listed and verified the newly submitted jobs ran successfully.
Waited up to 20 mins before shutting down the cluster


> Sometimes job does not get removed from scheduler queue after it is killed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-5794
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5794
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.0
>            Reporter: Karam Singh
>
> Sometimes when we kill a job, it does get removed from waiting queue, while job status: "Killed" with Job Setup and Cleanup: "Successful" 
> Also JobTracker webui shows job under failed jobs lists and hadoop job -list all, hadoop queue <queuename> -showJobs also shows jobs state=5.
> Prior to killing job state was "Running"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.