You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2009/01/20 21:19:02 UTC

[jira] Commented: (HADOOP-5075) Potential infinite loop in updateMinSlots

    [ https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665540#action_12665540 ] 

Joydeep Sen Sarma commented on HADOOP-5075:
-------------------------------------------

question - regarding the 'break' in the slotsLeft == oldSlots

this doesn't look correct to me - it seems that there is no guarantee that all available slots are distributed in one round. and that is why earlier we had a for loop over the slots. but now we are claiming that by going over the jobs one last time - we will be able to distribute all the slots?

The basic problem seems to be:

             int share = (int) Math.ceil(oldSlots * weight / totalWeight);
              slotsLeft = giveMinSlots(job, type, slotsLeft, share);

I believe that the share computed is quite likely to be less than the maximum number of slots that the task can consume. So going from 'floor' to 'ceil' may not be enough to guarantee that slots get consumed (and certainly not enough to consume that *all* the slots left get consumed).

my gut feel is that the correct solution (when oldSlots == slotsLeft) should be something that takes into account the max tasks that a job can consume (as opposed to it's weighted share only). 


> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0, 0.21.0
>
>         Attachments: hadoop-5075-v2.patch, hadoop-5075-v3.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the scheduler was repeating infinitely. This might happen if, due to rounding, we are unable to assign the last few slots in a pool. This patch adds a break statement to ensure that the loop exists if it hasn't managed to assign any slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.