You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "kyungwan nam (JIRA)" <ji...@apache.org> on 2015/10/30 06:29:27 UTC

[jira] [Commented] (SLIDER-939) flex down does not cancel the outstanding request

    [ https://issues.apache.org/jira/browse/SLIDER-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981940#comment-14981940 ] 

kyungwan nam commented on SLIDER-939:
-------------------------------------

Hi.
I've met same problem when "yarn.memory" is not a multiple of "yarn.scheduler.minimum-allocation-mb".
I think that this issue can be caused by SLIDER-955.

> flex down does not cancel the outstanding request
> -------------------------------------------------
>
>                 Key: SLIDER-939
>                 URL: https://issues.apache.org/jira/browse/SLIDER-939
>             Project: Slider
>          Issue Type: Bug
>          Components: core
>    Affects Versions: Slider 0.80
>         Environment: Hadoop 2.7.1 
> Slider 0.80.0
>            Reporter: Youjie Chen
>            Assignee: Steve Loughran
>              Labels: patch
>             Fix For: Slider 0.90
>
>
> I run slider app on  a 6 nodes cluster. To ensure there is only one comonent(worker) instance on each node, I set yarn.memory to 51% of the total memory. 
> Then I flex up to 7 workers,  there would be one worker request(outstanding)  that will never be met, this is expected.
> Then I flexed down back to 6 workers, and any container request for any job would be blocked even if there are plenty of memory/core for the job, From RM log, we can see there are continuous output:
> capacity.CapacityScheduler (CapacityScheduler.java:allocateContainersToNode(1240)) - Skipping scheduling since node test.example.com:45454 is reserved by application appattempt_1442384698868_0008_000001
>  It seems  the outstanding requests are not actually cancelled in the requesting container queue but keep trying to request.
> After I flexed down to 5 workers, the other blocked jobs can run.
> This is related to JIRA https://issues.apache.org/jira/browse/SLIDER-490



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)