You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2011/04/16 01:02:05 UTC

[jira] [Created] (MAPREDUCE-2441) regression: maximum limit of -1 doesn't appear to be unlimited anymore

regression: maximum limit of -1 doesn't appear to be unlimited anymore
----------------------------------------------------------------------

                 Key: MAPREDUCE-2441
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2441
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: benchmarks
    Affects Versions: 0.20.203.0
            Reporter: Allen Wittenauer
            Priority: Blocker


The math around the slot usage when maximum-capacity=-1 appears to be faulty.  See comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAPREDUCE-2441) regression: maximum limit of -1 + user-lmit math appears to be off

Posted by "Allen Wittenauer (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer resolved MAPREDUCE-2441.
-----------------------------------------

    Resolution: Won't Fix
    
> regression: maximum limit of -1 + user-lmit math appears to be off
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2441
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2441
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.203.0
>            Reporter: Allen Wittenauer
>            Priority: Critical
>         Attachments: capsched.xml
>
>
> The math around the slot usage when maximum-capacity=-1 appears to be faulty.  See comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2441) regression: maximum limit of -1 + user-lmit math appears to be off

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032190#comment-13032190 ] 

Arun C Murthy commented on MAPREDUCE-2441:
------------------------------------------

Allen, I'm sorry I missed this ticket.

As we briefly spoke over IM previously, the CS in 0.20.203 is designed to not allow a single user to go over the natural limit of the queue. As in the docs, you'll need to set the user-limit-factor for the queue to allow a user to go over... I'm pretty sure I told you on in person ;)

> regression: maximum limit of -1 + user-lmit math appears to be off
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2441
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2441
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.203.0
>            Reporter: Allen Wittenauer
>            Priority: Critical
>         Attachments: capsched.xml
>
>
> The math around the slot usage when maximum-capacity=-1 appears to be faulty.  See comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2441) regression: maximum limit of -1 + user-lmit math appears to be off

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032504#comment-13032504 ] 

Allen Wittenauer commented on MAPREDUCE-2441:
---------------------------------------------

Nope, not about user-limit-factor.  But doesn't this mean that the first jobs in an expanding queue can starve out jobs in another queue?  In other words, if I have:

job1 = max-lim -1 queue
job2 = max-lim -1 queue
job3 = max-lim % queue

job1 and job2 could take all slots before job3 gets executed, especially if they are submitted by the same user and that is the only user in the job submission queue.

> regression: maximum limit of -1 + user-lmit math appears to be off
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2441
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2441
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.203.0
>            Reporter: Allen Wittenauer
>            Priority: Critical
>         Attachments: capsched.xml
>
>
> The math around the slot usage when maximum-capacity=-1 appears to be faulty.  See comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2441) regression: maximum limit of -1 + user-lmit math appears to be off

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032165#comment-13032165 ] 

Allen Wittenauer commented on MAPREDUCE-2441:
---------------------------------------------

Actually, it looks like queue spillage/task stealing doesn't work at all, whether it is -1 or not.  The problem code appears to be in assignSlotsToJob which appears to have replaced the two-phase system in previous versions with a single phase.  This single phase does this check to determine the limit:

{code}
int limit =
      Math.min(
          Math.max(divideAndCeil(currentCapacity, activeUsers),
                   divideAndCeil(ulMin*currentCapacity, 100)),
          (int)(queueCapacity * ulMinFactor)
          );

{code}

In a two queue system where one is -1 and the other is a number, the maximum queue capacity ends up being set to the remainder.  Without a second pass, any additional slots from other queues are essentially ignored.

> regression: maximum limit of -1 + user-lmit math appears to be off
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2441
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2441
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.203.0
>            Reporter: Allen Wittenauer
>            Priority: Blocker
>         Attachments: capsched.xml
>
>
> The math around the slot usage when maximum-capacity=-1 appears to be faulty.  See comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2441) regression: maximum limit of -1 + user-lmit math appears to be off

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020466#comment-13020466 ] 

Allen Wittenauer commented on MAPREDUCE-2441:
---------------------------------------------

or the queue limit.

So where does the 266 come from?  The job was a terasort job with 1000 map tasks.

> regression: maximum limit of -1 + user-lmit math appears to be off
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2441
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2441
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: benchmarks
>    Affects Versions: 0.20.203.0
>            Reporter: Allen Wittenauer
>            Priority: Blocker
>         Attachments: capsched.xml
>
>
> The math around the slot usage when maximum-capacity=-1 appears to be faulty.  See comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2441) regression: maximum limit of -1 doesn't appear to be unlimited anymore

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer updated MAPREDUCE-2441:
----------------------------------------

    Attachment: capsched.xml

This is our capacity scheduler configuration.  On a test grid with 762 map slots, the first user in running in the default queue only got 266 map slots.  This doesn't appear to be either the user limit or the max limit.  

> regression: maximum limit of -1 doesn't appear to be unlimited anymore
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2441
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2441
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: benchmarks
>    Affects Versions: 0.20.203.0
>            Reporter: Allen Wittenauer
>            Priority: Blocker
>         Attachments: capsched.xml
>
>
> The math around the slot usage when maximum-capacity=-1 appears to be faulty.  See comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2441) regression: maximum limit of -1 + user-lmit math appears to be off

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer updated MAPREDUCE-2441:
----------------------------------------

    Component/s:     (was: benchmarks)
                 contrib/capacity-sched

> regression: maximum limit of -1 + user-lmit math appears to be off
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2441
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2441
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.203.0
>            Reporter: Allen Wittenauer
>            Priority: Blocker
>         Attachments: capsched.xml
>
>
> The math around the slot usage when maximum-capacity=-1 appears to be faulty.  See comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2441) regression: maximum limit of -1 + user-lmit math appears to be off

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer updated MAPREDUCE-2441:
----------------------------------------

    Summary: regression: maximum limit of -1 + user-lmit math appears to be off  (was: regression: maximum limit of -1 doesn't appear to be unlimited anymore)

> regression: maximum limit of -1 + user-lmit math appears to be off
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2441
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2441
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: benchmarks
>    Affects Versions: 0.20.203.0
>            Reporter: Allen Wittenauer
>            Priority: Blocker
>         Attachments: capsched.xml
>
>
> The math around the slot usage when maximum-capacity=-1 appears to be faulty.  See comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2441) regression: maximum limit of -1 + user-lmit math appears to be off

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer updated MAPREDUCE-2441:
----------------------------------------

    Priority: Critical  (was: Blocker)

Changing this from a blocker, since no one but me apparently cares that capacity scheduler doesn't actually work as advertised.

> regression: maximum limit of -1 + user-lmit math appears to be off
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2441
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2441
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.203.0
>            Reporter: Allen Wittenauer
>            Priority: Critical
>         Attachments: capsched.xml
>
>
> The math around the slot usage when maximum-capacity=-1 appears to be faulty.  See comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2441) regression: maximum limit of -1 + user-lmit math appears to be off

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032173#comment-13032173 ] 

Allen Wittenauer commented on MAPREDUCE-2441:
---------------------------------------------

Actually, let me correct myself.  Task stealing does work--but in a sort of weird and unpredictable way.  Basically, an individual user is limited to the "natural" size of the queue they submitted. So if two users are in the same queue that queue can steal up to 2xqueue size, etc.

> regression: maximum limit of -1 + user-lmit math appears to be off
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2441
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2441
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.203.0
>            Reporter: Allen Wittenauer
>            Priority: Critical
>         Attachments: capsched.xml
>
>
> The math around the slot usage when maximum-capacity=-1 appears to be faulty.  See comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira