You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Weiwei Yang (JIRA)" <ji...@apache.org> on 2018/11/15 10:36:00 UTC

[jira] [Updated] (YARN-8833) compute shares may lock the scheduling process

     [ https://issues.apache.org/jira/browse/YARN-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weiwei Yang updated YARN-8833:
------------------------------
    Description: 
When use w2rRatio compute fair share, there may be a chance triggering the problem of Int overflow, and entering an infinite loop.

Since the compute share thread holds the writeLock, it may blocking scheduling thread.

This issue occurs in a production environment with 8500 nodes. And we have already fixed it.

 

added 2018-10-29: elaborate the problem 

/**
 * Compute the resources that would be used given a weight-to-resource ratio
 * w2rRatio, for use in the computeFairShares algorithm as described in #
 */
 private static int resourceUsedWithWeightToResourceRatio(double w2rRatio,
 Collection<? extends Schedulable> schedulables, String type) \{ int resourcesTaken = 0; for (Schedulable sched : schedulables) \{ int share = computeShare(sched, w2rRatio, type); resourcesTaken += share; }
return resourcesTaken;
 }

The variable resourcesTaken is an integer type. And it also is accumulated value of result of

computeShare(Schedulable sched, double w2rRatio,String type) which is a value between the min share and max share of a queue.

For example, when there are 3 queues, each has min share = max share = 

Integer.MAX_VALUE, the resourcesTaken will be out of Integer bound, and it will be a negative number.

when resourceUsedWithWeightToResourceRatio(double w2rRatio, Collection<? extends Schedulable> schedulables, String type) return a negative number, the loop in 

computeSharesInternal() may never out which got the scheduler lock.

 

//org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares

while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
 < totalResource)

{ rMax *= 2.0; }

This may blocking scheduling thread.

  was:
When use w2rRatio compute fair share, there may be a chance triggering the problem of Int overflow, and entering an infinite loop.

Since the compute share thread holds the writeLock, it may blocking scheduling thread.

This issue occurs in a production environment with 8500 nodes. And we have already fixed it.

 

added 2018-10-29: elaborate the problem 

/**
 * Compute the resources that would be used given a weight-to-resource ratio
 * w2rRatio, for use in the computeFairShares algorithm as described in #
 */
 private static int resourceUsedWithWeightToResourceRatio(double w2rRatio,
 Collection<? extends Schedulable> schedulables, String type) {
 int resourcesTaken = 0;
 for (Schedulable sched : schedulables) \{ int share = computeShare(sched, w2rRatio, type); resourcesTaken += share; }
return resourcesTaken;
 }

The variable resourcesTaken is an integer type. And it also is accumulated value of result of

computeShare(Schedulable sched, double w2rRatio,String type) which is a value between the min share and max share of a queue.

For example, when there are 3 queues, each has min share = max share = 

Integer.MAX_VALUE, the resourcesTaken will be out of Integer bound, and it will be a negative number.

when resourceUsedWithWeightToResourceRatio(double w2rRatio, Collection<? extends Schedulable> schedulables, String type) return a negative number, the loop in 

computeSharesInternal() may never out which got the scheduler lock.

 

//org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares

while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
 < totalResource){

rMax *= 2.0;

}

This may blocking scheduling thread.

 

 

 

 

 

 


> compute shares may  lock the scheduling process
> -----------------------------------------------
>
>                 Key: YARN-8833
>                 URL: https://issues.apache.org/jira/browse/YARN-8833
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>            Reporter: liyakun
>            Assignee: liyakun
>            Priority: Major
>         Attachments: YARN-8833.patch
>
>
> When use w2rRatio compute fair share, there may be a chance triggering the problem of Int overflow, and entering an infinite loop.
> Since the compute share thread holds the writeLock, it may blocking scheduling thread.
> This issue occurs in a production environment with 8500 nodes. And we have already fixed it.
>  
> added 2018-10-29: elaborate the problem 
> /**
>  * Compute the resources that would be used given a weight-to-resource ratio
>  * w2rRatio, for use in the computeFairShares algorithm as described in #
>  */
>  private static int resourceUsedWithWeightToResourceRatio(double w2rRatio,
>  Collection<? extends Schedulable> schedulables, String type) \{ int resourcesTaken = 0; for (Schedulable sched : schedulables) \{ int share = computeShare(sched, w2rRatio, type); resourcesTaken += share; }
> return resourcesTaken;
>  }
> The variable resourcesTaken is an integer type. And it also is accumulated value of result of
> computeShare(Schedulable sched, double w2rRatio,String type) which is a value between the min share and max share of a queue.
> For example, when there are 3 queues, each has min share = max share = 
> Integer.MAX_VALUE, the resourcesTaken will be out of Integer bound, and it will be a negative number.
> when resourceUsedWithWeightToResourceRatio(double w2rRatio, Collection<? extends Schedulable> schedulables, String type) return a negative number, the loop in 
> computeSharesInternal() may never out which got the scheduler lock.
>  
> //org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares
> while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
>  < totalResource)
> { rMax *= 2.0; }
> This may blocking scheduling thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org