You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "liyakun (JIRA)" <ji...@apache.org> on 2018/10/29 06:30:00 UTC
[jira] [Updated] (YARN-8833) compute shares may lock the
scheduling process
[ https://issues.apache.org/jira/browse/YARN-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyakun updated YARN-8833:
--------------------------
Description:
When use w2rRatio compute fair share, there may be a chance triggering the problem of Int overflow, and entering an infinite loop.
Since the compute share thread holds the writeLock, it may blocking scheduling thread.
This issue occurs in a production environment with 8500 nodes. And we have already fixed it.
2018-10-29: add
/**
* Compute the resources that would be used given a weight-to-resource ratio
* w2rRatio, for use in the computeFairShares algorithm as described in #
*/
private static int resourceUsedWithWeightToResourceRatio(double w2rRatio,
Collection<? extends Schedulable> schedulables, String type) {
int resourcesTaken = 0;
for (Schedulable sched : schedulables) {
int share = computeShare(sched, w2rRatio, type);
resourcesTaken += share;
}
return resourcesTaken;
}
The variable resourcesTaken is an integer type. And it also is accumulated value of result of
computeShare(Schedulable sched, double w2rRatio,String type) which is a value between the min share and max share of a queue.
For example, when there are 3 queues, each has min share = max share =
Integer.MAX_VALUE, the resourcesTaken will be out of Integer bound, and it will be a negative number.
when resourceUsedWithWeightToResourceRatio(double w2rRatio, Collection<? extends Schedulable> schedulables, String type) return a negative number, the loop in the next may never out.
//org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares
while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
< totalResource) {
rMax *= 2.0;
}
.
was:
When use w2rRatio compute fair share, there may be a chance triggering the problem of Int overflow, and entering an infinite loop.
Since the compute share thread holds the writeLock, it may blocking scheduling thread.
This issue occurs in a production environment with 8500 nodes. And we have already fixed it.
> compute shares may lock the scheduling process
> -----------------------------------------------
>
> Key: YARN-8833
> URL: https://issues.apache.org/jira/browse/YARN-8833
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Reporter: liyakun
> Priority: Major
>
> When use w2rRatio compute fair share, there may be a chance triggering the problem of Int overflow, and entering an infinite loop.
> Since the compute share thread holds the writeLock, it may blocking scheduling thread.
> This issue occurs in a production environment with 8500 nodes. And we have already fixed it.
>
> 2018-10-29: add
> /**
> * Compute the resources that would be used given a weight-to-resource ratio
> * w2rRatio, for use in the computeFairShares algorithm as described in #
> */
> private static int resourceUsedWithWeightToResourceRatio(double w2rRatio,
> Collection<? extends Schedulable> schedulables, String type) {
> int resourcesTaken = 0;
> for (Schedulable sched : schedulables) {
> int share = computeShare(sched, w2rRatio, type);
> resourcesTaken += share;
> }
> return resourcesTaken;
> }
>
> The variable resourcesTaken is an integer type. And it also is accumulated value of result of
> computeShare(Schedulable sched, double w2rRatio,String type) which is a value between the min share and max share of a queue.
> For example, when there are 3 queues, each has min share = max share =
> Integer.MAX_VALUE, the resourcesTaken will be out of Integer bound, and it will be a negative number.
> when resourceUsedWithWeightToResourceRatio(double w2rRatio, Collection<? extends Schedulable> schedulables, String type) return a negative number, the loop in the next may never out.
>
> //org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares
> while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
> < totalResource) {
> rMax *= 2.0;
> }
> .
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org