You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jun Gong (JIRA)" <ji...@apache.org> on 2016/08/02 10:44:20 UTC
[jira] [Comment Edited] (YARN-5333) Some recovered apps are put into default queue when RM HA

    [ https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403741#comment-15403741 ] 

Jun Gong edited comment on YARN-5333 at 8/2/16 10:43 AM:
---------------------------------------------------------

Thanks [~rohithsharma], [~jianhe] for the review and comments!

bq. 1. Should private boolean isTransitingToActive = false; is volatile?
Yes, it needs be volatile. I'll update it.

{quote}
2. Since none of the refreshXXX methods are synchronized, patch introduces a concurrency issue. If there is an explicit admin call for refreshing at the time of transitionToActive, then checkRMStatus will be executed for other admin calls. Until RM transition-to-active completely, explicit admin commands should not allowed to refresh. I think, we should incorporate similar to refreshAdminAcl method.
{quote}
How about adding {{synchronized}} to each refresh functions? It avoids adding more logic. When admin command comes, we could just call corresponding refresh functions. I think it does not matter to call refresh function many times.

bq. 3. I think flag checkRMHAState can be passed to method checkRMStatus.
I was thinking it. If adding checkRMHAState to checkRMStatus, we need add this parameter(checkRMHAState) to all refresh functions too(which is similar to refreshAdminAcl), there are a lot of places that call refresh functions. It might be better to just add a check before checkRMStatus?

bq. I think if you can simulate test for generally instead of specific to fair scheduler, this test can be moved to class TestRMHA. There is already test TestRMHA#testTransitionedToActiveRefreshFail, probable the same test can be changed?
Thanks. I'll update the test case.

{quote}
Instead of reusing the existing refreshAll method, I checked each refresh method, it should be cleaner to just create a new method which includes all necessary reconfig steps. This also avoids unnecessary audit logs, acl checks.
{quote}
Yes, it will be more clear to add a new method to include all reconfig steps. My doubt is that there will be two places that do similar reconfig things(the one is in refresh functions, the other is in the new added method). Then we need to modify both places if there is some change for one of them. I will try to refactor those refresh functions.


was (Author: hex108):
Thanks [~rohithsharma], [~jianhe] for the review and comments!

bq. 1. Should private boolean isTransitingToActive = false; is volatile?
Yes, it needs be volatile. I'll update it.

{quote}
2. Since none of the refreshXXX methods are synchronized, patch introduces a concurrency issue. If there is an explicit admin call for refreshing at the time of transitionToActive, then checkRMStatus will be executed for other admin calls. Until RM transition-to-active completely, explicit admin commands should not allowed to refresh. I think, we should incorporate similar to refreshAdminAcl method.
{quote}
How about adding {{synchronized}} to each refresh functions? It avoids adding more logic. When admin command comes, we could just call corresponding refresh functions. I think it does not matter to call refresh function many times.

bq. 3. I think flag checkRMHAState can be passed to method checkRMStatus.
I was thinking it. If adding checkRMHAState to checkRMStatus, we need add this parameter(checkRMHAState) to all refresh functions too(which is similar to refreshAdminAcl), there are a lot of places that call refresh functions. It might be better to just add a check before checkRMStatus?

bq. I think if you can simulate test for generally instead of specific to fair scheduler, this test can be moved to class TestRMHA. There is already test TestRMHA#testTransitionedToActiveRefreshFail, probable the same test can be changed?
Thanks. I'll update the test case.

{quote}
Instead of reusing the existing refreshAll method, I checked each refresh method, it should be cleaner to just create a new method which includes all necessary reconfig steps. This also avoids unnecessary audit logs, acl checks.
{quote}
Yes, it will be more clear to add a new method to include all reconfig steps. My doubt is that there will be two places that do similar reconfig things(the one is in refresh functions, the other is in the new added method). Then we need to modify both places if there is some change for one of them.

> Some recovered apps are put into default queue when RM HA
> ---------------------------------------------------------
>
>                 Key: YARN-5333
>                 URL: https://issues.apache.org/jira/browse/YARN-5333
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-5333.01.patch, YARN-5333.02.patch, YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch
>
>
> Enable RM HA and use FairScheduler, {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, {{yarn.scheduler.fair.user-as-default-queue}} is set to false.
> Reproduce steps:
> 1. Start two RMs.
> 2. After RMs are running, change both RM's file {{etc/hadoop/fair-scheduler.xml}}, then add some queues.
> 3. Submit some apps to the new added queues.
> 4. Stop the active RM, then the standby RM will transit to active and recover apps.
> However the new active RM will put recovered apps into default queue because it might have not loaded the new {{fair-scheduler.xml}}. We need call {{initScheduler}} before start active services or bring {{refreshAll()}} in front of {{rm.transitionToActive()}}. *It seems it is also important for other scheduler*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org