You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Matei Zaharia (JIRA)" <ji...@apache.org> on 2009/01/17 02:43:59 UTC

[jira] Created: (HADOOP-5075) Potential infinite loop in updateMinSlots

Potential infinite loop in updateMinSlots
-----------------------------------------

                 Key: HADOOP-5075
                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/fair-share
            Reporter: Matei Zaharia
            Priority: Blocker
         Attachments: hadoop-5075.patch

We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-5075:
----------------------------------

    Attachment: hadoop-5075-v2.patch

I think I found the cause for this now. The problem was with the weight normalization we did in HADOOP-4789 to have equal shares per pools: The map weights were also being used for reduces. The reason this caused a problem is because sometimes the map weight could be infinity if the job had no runnable maps, and then we divided by this in updateMinSlots when calculating guaranteed shares for reduces, leading to a 0 in both the ceil and the floor as I said. This patch fixes that bug and also ensures that no infinities get introduced.

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>         Attachments: hadoop-5075-v2.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665540#action_12665540 ] 

Joydeep Sen Sarma commented on HADOOP-5075:
-------------------------------------------

question - regarding the 'break' in the slotsLeft == oldSlots

this doesn't look correct to me - it seems that there is no guarantee that all available slots are distributed in one round. and that is why earlier we had a for loop over the slots. but now we are claiming that by going over the jobs one last time - we will be able to distribute all the slots?

The basic problem seems to be:

             int share = (int) Math.ceil(oldSlots * weight / totalWeight);
              slotsLeft = giveMinSlots(job, type, slotsLeft, share);

I believe that the share computed is quite likely to be less than the maximum number of slots that the task can consume. So going from 'floor' to 'ceil' may not be enough to guarantee that slots get consumed (and certainly not enough to consume that *all* the slots left get consumed).

my gut feel is that the correct solution (when oldSlots == slotsLeft) should be something that takes into account the max tasks that a job can consume (as opposed to it's weighted share only). 


> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-5075:
----------------------------------

    Attachment: hadoop-5075-v3.patch

Updated patch to do this in a more consistent way with how the scheduler works already, setting weights to 0 when the job is not using a particular task type. Also added a unit test.

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-5075:
----------------------------------

    Fix Version/s: 0.21.0
                   0.20.0
                   0.19.1

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666048#action_12666048 ] 

Matei Zaharia commented on HADOOP-5075:
---------------------------------------

The Eclipse classpath warning is the following, and it happens with a clean version of trunk too:

     [exec] FAILED. Some jars are not declared in the Eclipse project.
     [exec]   Declared jars: build/ivy/lib/Hadoop/common/commons-codec-1.3.jar



> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075-v4.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665717#action_12665717 ] 

Matei Zaharia commented on HADOOP-5075:
---------------------------------------

That makes a lot of sense. I'll add a warning-level log message if there are slots left after the ceil loop.

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665730#action_12665730 ] 

dhruba borthakur commented on HADOOP-5075:
------------------------------------------

+1 Code looks good. It would be nice if you can pot the results of "ant test-patch" to this JIRA and then commit this to all releases starting from 0.19. Thanks a bunch.

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075-v4.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur reassigned HADOOP-5075:
----------------------------------------

    Assignee: Matei Zaharia

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Assignee: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075-v4.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665659#action_12665659 ] 

Matei Zaharia commented on HADOOP-5075:
---------------------------------------

By the way, the reason I think the break is not incorrect is the following: oldSlots will equal slotsLeft after the floor loop only if the number of slots we had left to distribute was so small that no job had its share of the total above 0.5. In this case, the ceil loop will just add +1 slot to the first few jobs in order of weight and then run out of slots to distribute, and the outer loop would've exited anyway with slotsLeft = 0. Remember that slotsLeft is the number of guaranteed slots for the pool that we have to distribute, so it's bounded and way smaller than the number of slots a job could potentially consume (it might get some of those too, but not as guaranteed share, just as fair share). Also, the jobs list that we iterate over has only jobs that are not at full capacity and can thus all use an extra slot, so we don't "lose" slots by going through the list and seeing that actually some jobs can't use any slots.

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-5075:
----------------------------------

    Status: Patch Available  (was: Open)

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-5075:
----------------------------------

    Fix Version/s:     (was: 0.19.1)

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075-v4.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-5075:
----------------------------------

    Attachment: hadoop-5075-v4.patch

Here's a new patch. If there are no objections I'll commit this because it fixes a behavior that's very broken in the current trunk.

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075-v4.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-5075:
----------------------------------

    Attachment: hadoop-5075.patch

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>         Attachments: hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665653#action_12665653 ] 

Matei Zaharia commented on HADOOP-5075:
---------------------------------------

Yup, the break should be unnecessary now that the infinities and NaN's have been fixed. I left it in to prevent an infinite loop if some other bug later introduces the same kind of problem, but I'm not sure which is the best way to go from a software engineering point of view - prevent the lockup of a critical system component and hope that users will notice any infinities on the calculations page, or allow the lockup to make sure that any calculation problems are noticed. Any thoughts on this?

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665554#action_12665554 ] 

Joydeep Sen Sarma commented on HADOOP-5075:
-------------------------------------------

on second thoughts - the original ceil and floor logic should always make progress. completely buy the theory that the flawed mapWeightSum was causing problems.

if that is indeed the case - the break seems unnecessary (and incorrect - right?)

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666025#action_12666025 ] 

dhruba borthakur commented on HADOOP-5075:
------------------------------------------

Hi Matei, if you can run "ant test-patch" on a new checked out version of trunk without your patch, you can verify if the Eclipse warning is beucase of your patch or not.

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075-v4.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665714#action_12665714 ] 

dhruba borthakur commented on HADOOP-5075:
------------------------------------------

One option would be to log something when the "break" exited early, if that case ever occurs. This will ensure that production systems do not get impacted by an infinite loop, at the same time inspecting the log will tell us that the software is not running optimally.

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666009#action_12666009 ] 

Matei Zaharia commented on HADOOP-5075:
---------------------------------------

Here's the test-patch output:

     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.

Pretty sure the Eclipse classpath warning is due to the switch to Ivy, since I don't change the classpath, so I'm going to commit this.

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075-v4.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-5075:
----------------------------------

    Status: Patch Available  (was: Open)

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>         Attachments: hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-5075:
----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I've committed this fix into 0.20 and trunk. It didn't affect 0.19, I had mislabeled the JIRA originally.

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075-v4.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5075) Potential infinite loop in updateMinSlots

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia updated HADOOP-5075:
----------------------------------

    Status: Open  (was: Patch Available)

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.