You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Vivek Ratan (JIRA)" <ji...@apache.org> on 2009/01/07 12:26:44 UTC

[jira] Created: (HADOOP-4988) An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero

An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero
------------------------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-4988
                 URL: https://issues.apache.org/jira/browse/HADOOP-4988
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/capacity-sched
            Reporter: Vivek Ratan
            Priority: Blocker


HADOOP-4373 introduced a fix for queues with guaranteed capacity (gc) equal to zero. Part of the fix was in the queue comparator used to sort queues. Queues with gc=0 were placed at the end. This causes a problem with the code for reclaiming capacity, which assumes that queues are sorted based on free space available and that a queue with gc=0 is no different than a queue which is running at capacity. Because of this, the following problem can arise: if we have a system with at least one queue whose gc=0, we may fail to reclaim capacity for some queues. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4988) An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661526#action_12661526 ] 

Vivek Ratan commented on HADOOP-4988:
-------------------------------------

A detailed explanation: in TaskSchedulingMgr.reclaimCapacity, we stop looking for capacity to reclaim if no queue is running over capacity. This we determine by looking at the last queue and checking if its number of running tasks is <= its gc. If we place queues with gc=0 at the end of a queue, this condition is true and we stop looking for capacity to reclaim at the first pass itself. 

Queues with gc=0 should be treated the same as queues with (# of running tasks == gc). 

> An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Blocker
>
> HADOOP-4373 introduced a fix for queues with guaranteed capacity (gc) equal to zero. Part of the fix was in the queue comparator used to sort queues. Queues with gc=0 were placed at the end. This causes a problem with the code for reclaiming capacity, which assumes that queues are sorted based on free space available and that a queue with gc=0 is no different than a queue which is running at capacity. Because of this, the following problem can arise: if we have a system with at least one queue whose gc=0, we may fail to reclaim capacity for some queues. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4988) An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4988:
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.20.0
         Assignee: Vivek Ratan
           Status: Resolved  (was: Patch Available)

I just committed this to trunk and Hadoop 0.20 branch. Thanks, Vivek !

> An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Assignee: Vivek Ratan
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: 4988.1.patch, 4988.2.patch
>
>
> HADOOP-4373 introduced a fix for queues with guaranteed capacity (gc) equal to zero. Part of the fix was in the queue comparator used to sort queues. Queues with gc=0 were placed at the end. This causes a problem with the code for reclaiming capacity, which assumes that queues are sorted based on free space available and that a queue with gc=0 is no different than a queue which is running at capacity. Because of this, the following problem can arise: if we have a system with at least one queue whose gc=0, we may fail to reclaim capacity for some queues. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4988) An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664222#action_12664222 ] 

Hemanth Yamijala commented on HADOOP-4988:
------------------------------------------

Looks good. +1.

> An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Blocker
>         Attachments: 4988.1.patch, 4988.2.patch
>
>
> HADOOP-4373 introduced a fix for queues with guaranteed capacity (gc) equal to zero. Part of the fix was in the queue comparator used to sort queues. Queues with gc=0 were placed at the end. This causes a problem with the code for reclaiming capacity, which assumes that queues are sorted based on free space available and that a queue with gc=0 is no different than a queue which is running at capacity. Because of this, the following problem can arise: if we have a system with at least one queue whose gc=0, we may fail to reclaim capacity for some queues. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4988) An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4988:
--------------------------------

    Status: Patch Available  (was: Open)

> An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Blocker
>         Attachments: 4988.1.patch
>
>
> HADOOP-4373 introduced a fix for queues with guaranteed capacity (gc) equal to zero. Part of the fix was in the queue comparator used to sort queues. Queues with gc=0 were placed at the end. This causes a problem with the code for reclaiming capacity, which assumes that queues are sorted based on free space available and that a queue with gc=0 is no different than a queue which is running at capacity. Because of this, the following problem can arise: if we have a system with at least one queue whose gc=0, we may fail to reclaim capacity for some queues. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4988) An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663970#action_12663970 ] 

Hemanth Yamijala commented on HADOOP-4988:
------------------------------------------

bq. removed code in assignTasks() that depended on queues with gc=0 being at the end of the collection.

Vivek, started looking at this patch. If a queue has no capacity, we should not be giving a task. The code removed in the patch would hand out a task to it, which is wrong. What the fix should be is that previously, because of the sort order of queues, since queues with 0 capacity came at the end, we assumed there's no need to look at other queues. This we should change and start looking at other queues as well. Makes sense ?

> An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Blocker
>         Attachments: 4988.1.patch
>
>
> HADOOP-4373 introduced a fix for queues with guaranteed capacity (gc) equal to zero. Part of the fix was in the queue comparator used to sort queues. Queues with gc=0 were placed at the end. This causes a problem with the code for reclaiming capacity, which assumes that queues are sorted based on free space available and that a queue with gc=0 is no different than a queue which is running at capacity. Because of this, the following problem can arise: if we have a system with at least one queue whose gc=0, we may fail to reclaim capacity for some queues. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4988) An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4988:
--------------------------------

    Attachment: 4988.1.patch

Attached patch (4988.1.patch) with a simple fix: 
* changed the queue comparator to treat queues with gc=0 as queues running at capacity 
* removed code in assignTasks() that depended on queues with gc=0 being at the end of the collection. 
* Added a new test that checks for reclaiming capacity with a queue having gc=0


> An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Blocker
>         Attachments: 4988.1.patch
>
>
> HADOOP-4373 introduced a fix for queues with guaranteed capacity (gc) equal to zero. Part of the fix was in the queue comparator used to sort queues. Queues with gc=0 were placed at the end. This causes a problem with the code for reclaiming capacity, which assumes that queues are sorted based on free space available and that a queue with gc=0 is no different than a queue which is running at capacity. Because of this, the following problem can arise: if we have a system with at least one queue whose gc=0, we may fail to reclaim capacity for some queues. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4988) An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4988:
--------------------------------

    Attachment: 4988.2.patch

bq. If a queue has no capacity, we should not be giving a task. 

Good catch. I had forgotten about this check. That has been added, I've synced with trunk, and a new patch (4988.2.patch) is attached. I've run dos2unix on it, and the output of ant test-patch is below: 

{code}
     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
{code}


> An earlier fix, for HADOOP-4373, results in a problem with reclaiming capacity when one or more queues have a capacity equal to zero
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4988
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Blocker
>         Attachments: 4988.1.patch, 4988.2.patch
>
>
> HADOOP-4373 introduced a fix for queues with guaranteed capacity (gc) equal to zero. Part of the fix was in the queue comparator used to sort queues. Queues with gc=0 were placed at the end. This causes a problem with the code for reclaiming capacity, which assumes that queues are sorted based on free space available and that a queue with gc=0 is no different than a queue which is running at capacity. Because of this, the following problem can arise: if we have a system with at least one queue whose gc=0, we may fail to reclaim capacity for some queues. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.