You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Vivek Ratan (JIRA)" <ji...@apache.org> on 2009/01/05 07:35:44 UTC

[jira] Created: (HADOOP-4980) Cleanup the Capacity Scheduler code

Cleanup the Capacity Scheduler code
-----------------------------------

                 Key: HADOOP-4980
                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
             Project: Hadoop Core
          Issue Type: Improvement
            Reporter: Vivek Ratan


Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4980:
--------------------------------

    Attachment: 4980.2.patch

Attached new patch (4980.2.patch): contains changes due to resync with trunk. 

> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>         Attachments: 4980.1.patch, 4980.2.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661527#action_12661527 ] 

Hemanth Yamijala commented on HADOOP-4980:
------------------------------------------

Vivek,

This looks really good. The code structure is much better now and simpler to follow.

A few minor comments:

1. In the toString of TaskSchedulingInfo, we can use float instead of int, like we do for running tasks per queue, so that users who are running fractional values will also be represented correctly. And just to avoid confusion, should we exclude users who have no running tasks ?
2. The strings "Map Tasks" can be printed in TaskSchedulingInfo.toString, as that's the class that has knowledge about what is being printed.
3. There was code in assignTasks that did not return a task for queues with zero GC. This seems to have been inadvertently removed.
4. Rather than passing in the JobQueuesManager object to the QSI, which is a rather loaded object, can we just pass the boolean supportsPriority, which is all the information that the QSI needs ? That way, it will be one less place to check if we make any change to JobQueuesManager. Note that we are already reading the value of the configuration in the start() method.

> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>         Attachments: 4980.1.patch, 4980.2.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661836#action_12661836 ] 

Vivek Ratan commented on HADOOP-4980:
-------------------------------------

Results of running 'ant patch': 

{code}
Results:
     [exec] +1 overall. 
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 4 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
{code}


> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>         Attachments: 4980.1.patch, 4980.2.patch, 4980.3.patch, 4980.4.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4980:
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.20.0
         Assignee: Vivek Ratan
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch 0.20. As such this is only an improvement. However, it seemed a necessary change to address blocker bugs like HADOOP-4977 and HADOOP-4981.

Thanks, Vivek !

> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Assignee: Vivek Ratan
>             Fix For: 0.20.0
>
>         Attachments: 4980.1.patch, 4980.2.patch, 4980.3.patch, 4980.4.patch, 4980.5.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4980:
-------------------------------------

    Component/s: contrib/capacity-sched

> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4980:
--------------------------------

    Attachment: 4980.5.patch

Atatching new patch (4980.5.patch) after running dos2unix on the previous patch

> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>         Attachments: 4980.1.patch, 4980.2.patch, 4980.3.patch, 4980.4.patch, 4980.5.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4980:
--------------------------------

    Attachment: 4980.3.patch

Attaching new patch (4980.3.patch). 

bq. 1. In the toString of TaskSchedulingInfo, we can use float instead of int, like we do for running tasks per queue, so that users who are running fractional values will also be represented correctly. And just to avoid confusion, should we exclude users who have no running tasks ?

Good points. I now print out a float value, and I only print users who have at least one running task. 

bq. 2. The strings "Map Tasks" can be printed in TaskSchedulingInfo.toString, as that's the class that has knowledge about what is being printed.

The QSI knows which of the two TSIs is for a map and which is for a reduce. So it can display the right string. Having the TSI display the Map or Reduce string is harder. It can be done but I think it's fine for the QSI to print what it does. 

bq. 3. There was code in assignTasks that did not return a task for queues with zero GC. This seems to have been inadvertently removed.

Good catch. I've added that code in. It got removed by mistake. 

bq. 4. Rather than passing in the JobQueuesManager object to the QSI, which is a rather loaded object, can we just pass the boolean supportsPriority, which is all the information that the QSI needs ? That way, it will be one less place to check if we make any change to JobQueuesManager. Note that we are already reading the value of the configuration in the start() method.

The QSI needs a JobQueuesManager object for other things: to print out job-related information, for example. And this list will likely grow in the future, so it should have access to the JobQueuesManager object. 




> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>         Attachments: 4980.1.patch, 4980.2.patch, 4980.3.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4980:
--------------------------------

    Attachment: 4980.4.patch

Attaching 4980.4.patch, where I've fixed that problem for both the % of running tasks and % of capacity. In addition, we print the numbers with one decimal place. 

> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>         Attachments: 4980.1.patch, 4980.2.patch, 4980.3.patch, 4980.4.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661559#action_12661559 ] 

Hemanth Yamijala commented on HADOOP-4980:
------------------------------------------

Very minor nit, (and actually a bug in existing code too):

In toString of TaskSchedulingInfo, one of the values used for computing runningTasksAsPercent and the user's running tasks percentage should be cast to a float. Otherwise, the value gets converted to a float after the computation happens - which does not address the original concern.

Otherwise +1 for the patch.

> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>         Attachments: 4980.1.patch, 4980.2.patch, 4980.3.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661048#action_12661048 ] 

vivekr edited comment on HADOOP-4980 at 1/5/09 9:30 PM:
-------------------------------------------------------------

No new test cases are added, as this is just refactoring. I did have to modify some tests, as the display information has changed. 
Oh, I forgot to mention another change: The display string has changed, for the better I think. We print out the queue information first, followed by info on map and reduce tasks, followed by job-specific info.

      was (Author: vivekr):
    No new test cases are added, as this is just refactoring. I did have to modify some tests, as the display information has changed. 
Oh, I forgot to add another change: The display string has changed, for the better I think. We print out the queue information first, followed by info on map and reduce tasks, followed by job-specific info.
  
> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>         Attachments: 4980.1.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661048#action_12661048 ] 

Vivek Ratan commented on HADOOP-4980:
-------------------------------------

No new test cases are added, as this is just refactoring. I did have to modify some tests, as the display information has changed. 
Oh, I forgot to add another change: The display string has changed, for the better I think. We print out the queue information first, followed by info on map and reduce tasks, followed by job-specific info.

> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>         Attachments: 4980.1.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4980:
--------------------------------

    Attachment: 4980.1.patch

Attaching patch 4980.1.patch. Following changes have been made: 
* _tasksKilled_ member variable of _ReclaimResource_ class removed (never used anywhere)
* Some of the member variables in _QueueSchedulingInfo_, such as the user limit _ulMin_, applied to a queue while others, such as _guaranteedCapacityPercent_, were applicable to both map and reduce tasks. Rather than have two separate objects of type _QueueSchedulingInfo_, one each for map and reduce tasks, and thus duplicating the queue-specific information, we now have a separate _TaskSchedulingInfo_ class that encapsulates map/reduce-specific information. Queue-specific information remains in _QueueSchedulingInfo_. This also makes it much easier to print UI information. Each class' _toString_ method prints out its relevant information, which can be concatenated. 
* Renamed _SchedulingInfo_ class to _SchedulingDisplayInfo_. This class has been changed to be outside of the Scheduler - it is only aware of the name of the queue it represents. Its _toString_ method, that is called to obtain the UI display information, now simply gets the display information from the scheduler object. This does three things: the _SchedulingDisplayInfo_ interacts with the scheduler object and is oblivious to its internal data structures, the display information is created by the _QueueSchedulingInfo_ (which contains all data), and the call to the scheduler object is synchronized. 
* The _TaskLookupResult_ and _TaskLookUpStatus_ classes have been reorganized. The latter is moved inside the former, and static factory methods have been provided to create the common task lookup objects. The objects that represent no task being found and task not matching memory requirements do not need to be created again again and again as they're basically read-only, so we avoid creating many such objects by sharing a single instance. 
* I've cleaned up the logging code. We were logging the same result in multiple places. 
* Some methods and variables have been renamed to better denote what they represent. 
* The core scheduling algorithm, that of obtaining a task from a queue, is all in one place, within _getTaskFromQueue_. This improves readability and makes it clear what the code is doing. It will also help with metrics collection, as there is now one place where the scheduling decisions are taken. 
* The thread for redistributing capacity is started only if there is more than queue. 

None of the scheduling logic has changed. 


> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>         Attachments: 4980.1.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4980) Cleanup the Capacity Scheduler code

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4980:
--------------------------------

    Status: Patch Available  (was: Open)

> Cleanup the Capacity Scheduler code
> -----------------------------------
>
>                 Key: HADOOP-4980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4980
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>         Attachments: 4980.1.patch, 4980.2.patch
>
>
> Given the number of changes that have been made by different folks to the Capacity Scheduler code, the code needs to be cleaned up. Some comments and variable names are misleading, and the core logic is not in a central place, making it harder to understand. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.