You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Wangda Tan (JIRA)" <ji...@apache.org> on 2017/02/01 21:30:51 UTC

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

    [ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848962#comment-15848962 ] 

Wangda Tan commented on YARN-5889:
----------------------------------

Thanks [~sunilg] for updating the patch.

Last wave (hopefully :p) of comments for the latest patch

1) updateUserResourceUsage:
- javadocs parameter need to change
- remove LOG.info for debug

2) incResourceUsagePerUser/decResourceUsagePerUser are mostly identical, suggest to add the "allocated" parameter and rename it to updateResourceUsagePerUser. And writeLock is not necessary

3) getComputedResourceLimitForActiveUsers:
- Why {{userLimitNeedsRecompute}} is called here? Will it make the following {{isRecomputeNeeded}} to always return true?
My guess is, now we have only one localVersionOfUsersState for both of active user and total user. If we have two such map, one for active user and one for total user, it could solve the problem, correct?

4) isRecomputeNeeded:
- When userLimitPerSchedulingMode gonna be null?
- I'm still not quite sure about why {{userLimitPerSchedulingMode}} is required for {{isRecomputeNeeded}}:
{{getLocalVersionOfUsersState}} returns -1 when userLimitPerSchedulingMode doesn't contain schedulingMode, correct?
- And also, we don't need {{latestVersionOfUserCount}}, instead we should call {{latestVersionOfUsersState.get()}}.

5) So a summary of 3/4: I think we need two maps for local version, and isRecomputeNeed should take 3 parameters: schedulingMode, partition, and {{boolean activeUsers}}. Existing logic looks not correct to me: if version updated to 2 for partition=x, and scheduling_mode=y; then we get user-limit for active-user/total-user; and then version update to 3 for partition=x and scheduling_mode=y; then we get user-limit for active-user/total-user again, the 2nd time UL of total-user will not be updated.

6) getLatestVersionOfUsersState is too simple to be a method, better to remove.

7) userLimitNeedsRecompute:
Need to consider value becomes negative, and we use "-1" as default value for "not found" local version, we should make sure value is always >= 0.
I think you can do things like:
{code}
int x = version.incrementAndGet();
if (x < 0) {
    x = version.get();
	while (x < 0 && !version.compareAndSet(x, 0)) {
		x = version.get();
	}
}
{code}

8) Possible redundant null checks:
I'm not sure if they're required, we can keep them to make RM not crash, but highly suggest to print warning:
- incResourceUsagePerUser/decResourceUsagePerUser: {{totalResourceUsageForUsers}}

> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
>                 Key: YARN-5889
>                 URL: https://issues.apache.org/jira/browse/YARN-5889
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: YARN-5889.0001.patch, YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.0008.patch, YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with a write lock. To improve performance, this tickets is focussing on moving user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org