You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Zian Chen (JIRA)" <ji...@apache.org> on 2018/06/27 05:53:00 UTC
[jira] [Comment Edited] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

    [ https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524608#comment-16524608 ] 

Zian Chen edited comment on YARN-8379 at 6/27/18 5:52 AM:
----------------------------------------------------------

Hi [~leftnoteasy]. thanks for the comments.
 # This test case was intend to demonstrate selected candidates will be actually killed after custom timeout was reached. This part of code is the intention.
{code:java}
editPolicy.editSchedule();
Assert.assertEquals(4, editPolicy.getToPreemptContainers().size());

// check live containers immediately, nothing happen
Assert.assertEquals(39, schedulerApp1.getLiveContainers().size());

Thread.sleep(20*1000);
// Call editSchedule again: selected containers are killed
editPolicy.editSchedule();
waitNumberOfLiveContainersFromApp(schedulerApp1, 35);{code}

 # Totally understand your suggestion here, actually I was define a private member variable inside interface PreemptionCandidatesSelector so that we don't need to pass curCandidates for every selector instance. However I found TestCapacitySchedulerSurgicalPreemption#testPriorityPreemptionFromHighestPriorityQueueAndOldestContainer() get failed which helped me realize that curCandidates is a "per round reinitialize thing" rather than a "per selector" thing, we should give a clean curCandidates HashMap every time we call editSchedule, otherwise like this UT, we call editschedule multiple times but the selector remain the same instance, then curCandidates will contains old selected containers the second time we call the editSchedule, we could definitely make a method inside PreemptionCandidatesSelector, and call it explicitly to reset curCandidates per round, but this way it makes the code even harder to read. Any better suggestions here?
 # Your suggestion is not updating toPreempt on the fly but only update curCandidates, then after each round, update toPreempt right after selectCandidates is finished? If this is the correct understanding, I think from time complexity point of view, these two strategy should be same since we always need to go through each temp selected candidates and add it into toPreempt with duplicate check, but it helps with maintaining status of selected containers in one place(curCandidates) and avoid inconsistency. We can further discuss this part.

I'll address the asflicense issue with the next patch. Thanks!


was (Author: zian chen):
Hi [~leftnoteasy]. thanks for the comments.
 # This test case was intend to demonstrate selected candidates will be actually killed after custom timeout was reached. This part of code is the intention.
{code:java}
editPolicy.editSchedule();
Assert.assertEquals(4, editPolicy.getToPreemptContainers().size());

// check live containers immediately, nothing happen
Assert.assertEquals(39, schedulerApp1.getLiveContainers().size());

Thread.sleep(20*1000);
// Call editSchedule again: selected containers are killed
editPolicy.editSchedule();
waitNumberOfLiveContainersFromApp(schedulerApp1, 35);{code}

 # Totally understand your suggestion here, actually I was define a private member variable inside interface PreemptionCandidatesSelector so that we don't need to pass curCandidates for every selector instance. However I found TestCapacitySchedulerSurgicalPreemption#testPriorityPreemptionFromHighestPriorityQueueAndOldestContainer() get failed which helped me realize that curCandidates is a "per round reinitialize thing" rather than a "per selector" thing, we should give a clean curCandidates HashMap every time we call editSchedule, otherwise like this UT, we call editschedule multiple times but the selector remain the same instance, then curCandidates will contains old selected containers the second time we call the editSchedule, we could definitely make a method inside PreemptionCandidatesSelector, and call it explicitly to reset curCandidates per round, but this way it makes the code even harder to read. Any better suggestions here?
 # Your suggestion is not updating toPreempt on the fly but only update curCandidates, then after each round, update toPreempt right after selectCandidates is finished? If this is the correct understanding, I think from time complexity point of view, these two strategy should be same since we always need to go through each temp selected candidates and add it into toPreempt with duplicate check, but it helps with maintaining status of selected containers in one place(curCandidates) and avoid inconsistency. We can further discuss this part.

I'll address the asflicense issue with the next patch. Thanks!

 

 

 

> Add an option to allow Capacity Scheduler preemption to balance satisfied queues
> --------------------------------------------------------------------------------
>
>                 Key: YARN-8379
>                 URL: https://issues.apache.org/jira/browse/YARN-8379
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Wangda Tan
>            Assignee: Zian Chen
>            Priority: Major
>         Attachments: YARN-8379.001.patch, YARN-8379.002.patch, YARN-8379.003.patch, YARN-8379.004.patch, YARN-8379.005.patch, ericpayne.confs.tgz
>
>
> Existing capacity scheduler only supports preemption for an underutilized queue to reach its guaranteed resource. In addition to that, there’s an requirement to get better balance between queues when all of them reach guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing scheduler preemption won't happen. But this is unfair to queue_a since queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance queues. We changed the logic since it could preempt too many containers between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org