You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2022/12/20 10:10:47 UTC

[GitHub] [hadoop] Daniel-009497 opened a new pull request, #5247: YARN-11402 Plenty of useless logs printed during ResourceManager startup and recover containers of unknown application

Daniel-009497 opened a new pull request, #5247:
URL: https://github.com/apache/hadoop/pull/5247

   Tens of thousands of meaningless logs are frequently printed during ResourceManager startup and recover container.
   As we know, ResourceManager will always keep 10k application information by default. In our very big scale cluster, it is very usual that resourcemanager try to recover the containers which already finished and does not exist in ResourceManager but still reported by nodemanager.
   Under this case, below logs will be frequently printed, more importantly, this log is meaningless, in real production setups, the maintainers actually more care about which containers are properly recovered or killed not the ones are skipped.
   The related code are as follows,
   ![image](https://user-images.githubusercontent.com/78128883/208641490-28abaf03-d4c5-4735-93a3-aae9aca21227.png)
   
   So we move the log into function killOrphanContainerOnNode().
   ![image](https://user-images.githubusercontent.com/78128883/208641574-9bf36787-d3b3-4b0d-9ee9-aa14e4151da1.png)
   
   Only the containers to be killed need to be loged which is vital for trouble shooting to distinguish whether the containers are kill by hadoop inner mechanism or by users themselves.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] slfan1989 commented on a diff in pull request #5247: YARN-11402 Plenty of useless logs printed during ResourceManager startup and recover containers of unknown application

Posted by GitBox <gi...@apache.org>.
slfan1989 commented on code in PR #5247:
URL: https://github.com/apache/hadoop/pull/5247#discussion_r1053929516


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java:
##########
@@ -506,6 +506,7 @@ public void setEntitlement(String queue, QueueEntitlement entitlement)
   private void killOrphanContainerOnNode(RMNode node,
       NMContainerStatus container) {
     if (!container.getContainerState().equals(ContainerState.COMPLETE)) {
+      LOG.warn("Killing container " + container + " for unknown application");

Review Comment:
   I took a quick look at it yesterday, and today I'll take a closer look at the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] slfan1989 commented on a diff in pull request #5247: YARN-11402 Plenty of useless logs printed during ResourceManager startup and recover containers of unknown application

Posted by GitBox <gi...@apache.org>.
slfan1989 commented on code in PR #5247:
URL: https://github.com/apache/hadoop/pull/5247#discussion_r1053205820


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java:
##########
@@ -506,6 +506,7 @@ public void setEntitlement(String queue, QueueEntitlement entitlement)
   private void killOrphanContainerOnNode(RMNode node,
       NMContainerStatus container) {
     if (!container.getContainerState().equals(ContainerState.COMPLETE)) {
+      LOG.warn("Killing container " + container + " for unknown application");

Review Comment:
   Thank you very much for your contribution, but I think different people may have different understandings of the log output. This does not seem to be a general change, and I have not seen much value from this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] slfan1989 commented on a diff in pull request #5247: YARN-11402 Plenty of useless logs printed during ResourceManager startup and recover containers of unknown application

Posted by GitBox <gi...@apache.org>.
slfan1989 commented on code in PR #5247:
URL: https://github.com/apache/hadoop/pull/5247#discussion_r1053203404


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java:
##########
@@ -506,6 +506,7 @@ public void setEntitlement(String queue, QueueEntitlement entitlement)
   private void killOrphanContainerOnNode(RMNode node,
       NMContainerStatus container) {
     if (!container.getContainerState().equals(ContainerState.COMPLETE)) {
+      LOG.warn("Killing container " + container + " for unknown application");

Review Comment:
   Will this change result in more log output? Does this mean that all apps that enter this judgment must be printed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5247: YARN-11402 Plenty of useless logs printed during ResourceManager startup and recover containers of unknown application

Posted by GitBox <gi...@apache.org>.
hadoop-yetus commented on PR #5247:
URL: https://github.com/apache/hadoop/pull/5247#issuecomment-1359348399

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m  4s |  |  trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   0m 59s |  |  trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 55s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m  5s |  |  trunk passed  |
   | -1 :x: |  javadoc  |   0m 58s | [/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5247/1/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) |  hadoop-yarn-server-resourcemanager in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.  |
   | +1 :green_heart: |  javadoc  |   0m 45s |  |  trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m  7s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m  7s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 59s |  |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   0m 59s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 53s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 55s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 39s | [/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5247/1/artifact/out/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) |  hadoop-yarn-server-resourcemanager in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 58s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 47s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  99m 12s |  |  hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 32s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 195m 44s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5247/1/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5247 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 690a3d44d790 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 14644746d81db503457d71949fbe69527281322f |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5247/1/testReport/ |
   | Max. process+thread count | 969 (vs. ulimit of 5500) |
   | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5247/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] Daniel-009497 commented on a diff in pull request #5247: YARN-11402 Plenty of useless logs printed during ResourceManager startup and recover containers of unknown application

Posted by GitBox <gi...@apache.org>.
Daniel-009497 commented on code in PR #5247:
URL: https://github.com/apache/hadoop/pull/5247#discussion_r1053913286


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java:
##########
@@ -506,6 +506,7 @@ public void setEntitlement(String queue, QueueEntitlement entitlement)
   private void killOrphanContainerOnNode(RMNode node,
       NMContainerStatus container) {
     if (!container.getContainerState().equals(ContainerState.COMPLETE)) {
+      LOG.warn("Killing container " + container + " for unknown application");

Review Comment:
   > Will this change result in more log output? Does this mean that all apps that enter this judgment must be printed?
   
   @slfan1989  Not really, Only the orphan container which status is unfinished will be killed and logged here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] slfan1989 commented on a diff in pull request #5247: YARN-11402 Plenty of useless logs printed during ResourceManager startup and recover containers of unknown application

Posted by GitBox <gi...@apache.org>.
slfan1989 commented on code in PR #5247:
URL: https://github.com/apache/hadoop/pull/5247#discussion_r1053929516


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java:
##########
@@ -506,6 +506,7 @@ public void setEntitlement(String queue, QueueEntitlement entitlement)
   private void killOrphanContainerOnNode(RMNode node,
       NMContainerStatus container) {
     if (!container.getContainerState().equals(ContainerState.COMPLETE)) {
+      LOG.warn("Killing container " + container + " for unknown application");

Review Comment:
   I had a quick look at it yesterday, I'll be going through the code today.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] Daniel-009497 commented on a diff in pull request #5247: YARN-11402 Plenty of useless logs printed during ResourceManager startup and recover containers of unknown application

Posted by GitBox <gi...@apache.org>.
Daniel-009497 commented on code in PR #5247:
URL: https://github.com/apache/hadoop/pull/5247#discussion_r1053913286


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java:
##########
@@ -506,6 +506,7 @@ public void setEntitlement(String queue, QueueEntitlement entitlement)
   private void killOrphanContainerOnNode(RMNode node,
       NMContainerStatus container) {
     if (!container.getContainerState().equals(ContainerState.COMPLETE)) {
+      LOG.warn("Killing container " + container + " for unknown application");

Review Comment:
   > Will this change result in more log output? Does this mean that all apps that enter this judgment must be printed?
   
   @slfan1989  Not really, Only the orphan container which status is unfinished will be killed and logged here.  What is really matters is the minority ones be handled not the majority ones be skipped. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org