You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Hadoop QA (Jira)" <ji...@apache.org> on 2021/08/10 15:50:00 UTC

[jira] [Commented] (YARN-10873) Graceful Decommission ignores launched containers and gets deactivated before timeout

    [ https://issues.apache.org/jira/browse/YARN-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396752#comment-17396752 ] 

Hadoop QA commented on YARN-10873:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 45s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} codespell {color} | {color:blue}  0m  1s{color} |  | {color:blue} codespell was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |  | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color} |  | {color:green} The patch appears to include 1 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 44s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  3s{color} |  | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 55s{color} |  | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 48s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  0s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 47s{color} |  | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 43s{color} |  | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 53s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 11s{color} |  | {color:green} branch has no errors when building and testing our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 51s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 56s{color} |  | {color:green} the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 56s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 47s{color} |  | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 47s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 38s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 49s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 38s{color} |  | {color:green} the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 34s{color} |  | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 50s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 46s{color} |  | {color:green} patch has no errors when building and testing our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 23s{color} | [/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3287/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt] | {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 34s{color} |  | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}171m  1s{color} |  | {color:black}{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMNodeTransitions |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3287/1/artifact/out/Dockerfile |
| GITHUB PR | https://github.com/apache/hadoop/pull/3287 |
| JIRA Issue | YARN-10873 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell |
| uname | Linux 37e1f99808bf 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / f405c1f17dae5a88aca41c5ab631708444b74453 |
| Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
|  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3287/1/testReport/ |
| Max. process+thread count | 945 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager |
| Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3287/1/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Graceful Decommission ignores launched containers and gets deactivated before timeout
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-10873
>                 URL: https://issues.apache.org/jira/browse/YARN-10873
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 3.3.1
>            Reporter: Prabhu Joseph
>            Assignee: Srinivas S T
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Graceful Decommission of a Node gets deactivated before timeout even though there are launched containers. 
> On Status update from Node which is in Decommissioning, RM transitions the node to DECOMMISSIONED before timeout if there are no running applications. These running applications are added from the Container Statuses from NodeManager. We have observed Containers are launched at NodeManager and at the same time ResourceManager forcefully decommissions the node.
> This affects the Livy Interactive jobs which supports only one application attempt.
> Will suggest to check FicaSchedulerNode to identify if there are any launched containers and determine whether to forcefully decommission or not.
> {code}
>   public static class StatusUpdateWhenHealthyTransition implements
>       MultipleArcTransition<RMNodeImpl, RMNodeEvent, NodeState> {
>     @Override
>     public NodeState transition(RMNodeImpl rmNode, RMNodeEvent event) {
>       .....
>       if (isNodeDecommissioning) {
>         List<ApplicationId> keepAliveApps = statusEvent.getKeepAliveAppIds();
>         if (rmNode.runningApplications.isEmpty() &&
>             (keepAliveApps == null || keepAliveApps.isEmpty())) {
>           RMNodeImpl.deactivateNode(rmNode, NodeState.DECOMMISSIONED);
>           return NodeState.DECOMMISSIONED;
>         }
>       }
> {code}
> *ResourceManager Logs:*
> {code}
> 2021-06-16 08:45:04,140 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1623830067124_0382_000001
> 2021-06-16 08:45:04,141 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1623830067124_0382_01_000001, AllocationRequestId: 0, Version: 0, NodeId: node1:34753, NodeHttpAddress: 927a9ef942b24b1eaa0e99c39d4e73f90224b902983:8042, Resource: <memory:29696, vCores:4>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.1.2.3:34753 }, ExecutionType: GUARANTEED, ] for AM appattempt_1623830067124_0382_000001
> 2021-06-16 08:45:04,141 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1623830067124_0382_000001
> 2021-06-16 08:45:04,141 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1623830067124_0382_000001
> 2021-06-16 08:45:04,154 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_1623830067124_0382_01_000001, AllocationRequestId: 0, Version: 0, NodeId: node1:34753, NodeHttpAddress: 927a9ef942b24b1eaa0e99c39d4e73f90224b902983:8042, Resource: <memory:29696, vCores:4>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.1.2.3:34753 }, ExecutionType: GUARANTEED, ] for AM appattempt_1623830067124_0382_000001
> 2021-06-16 08:45:04,776 INFO org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully decommission node node1:34753 with state RUNNING
> 2021-06-16 08:45:04,776 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node node1:34753 in DECOMMISSIONING.
> 2021-06-16 08:45:04,776 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: node1:34753 Node Transitioned from RUNNING to DECOMMISSIONING
> 2021-06-16 08:45:05,131 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node node1:34753 as it is now DECOMMISSIONED
> 2021-06-16 08:45:05,131 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: node1:34753 Node Transitioned from DECOMMISSIONING to DECOMMISSIONED
> 2021-06-16 08:45:05,131 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1623830067124_0382_01_000001 Container Transitioned from ACQUIRED to KILLED
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org