You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/10/01 18:23:00 UTC

[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler

    [ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611965#comment-17611965 ] 

ASF GitHub Bot commented on YARN-11277:
---------------------------------------

aajisaka commented on code in PR #4797:
URL: https://github.com/apache/hadoop/pull/4797#discussion_r985128293


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml:
##########
@@ -4954,6 +4964,18 @@
   </property>
 
   <property>
+    <name>yarn.nodemanager.log.delete.threshold.mb</name>
+    <value>102400</value>
+    <description>
+      Optional.
+      Default is 102400

Review Comment:
   Would you remove this line as it is already documented in <value>102400</value>?



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml:
##########
@@ -4954,6 +4964,18 @@
   </property>
 
   <property>
+    <name>yarn.nodemanager.log.delete.threshold.mb</name>
+    <value>102400</value>
+    <description>
+      Optional.
+      Default is 102400
+      Trigger log-dir deletion when size bigger than

Review Comment:
   What size? Total log size or the largest log file size?



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml:
##########
@@ -4943,6 +4943,16 @@
     </description>
   </property>
 
+  <property>
+    <name>yarn.nodemanager.log.trigger.delete.by-size.enabled</name>
+    <value>false</value>
+    <description>
+      Optional.
+      Default is false

Review Comment:
   Remove this line. false is already documented in `<value>false</value>`.



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/NonAggregatingLogHandler.java:
##########
@@ -90,6 +93,12 @@ protected void serviceInit(Configuration conf) throws Exception {
     this.deleteDelaySeconds =
         conf.getLong(YarnConfiguration.NM_LOG_RETAIN_SECONDS,
                 YarnConfiguration.DEFAULT_NM_LOG_RETAIN_SECONDS);
+    this.enableTriggerDeleteBySize =
+        conf.getBoolean(YarnConfiguration.NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED,
+            YarnConfiguration.DEFAULT_NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED);
+    this.deleteThresholdMb =
+        conf.getLong(YarnConfiguration.NM_LOG_DELETE_THRESHOLD_MB,
+            YarnConfiguration.DEFAULE_NM_LOG_DELETE_THRESHOLD_MB);

Review Comment:
   I recommend to remove `mb` from the parameter name and use `conf.getLongBytes` to allow suffix such as  `100g`. Also, could you document how to use the suffix in yarn-site.xml as below?
   ```
   You can use the following suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.), Or provide complete size in bytes (such as 134217728 for 128 MB).
   ```



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java:
##########
@@ -4694,6 +4694,16 @@ public static boolean areNodeLabelsEnabled(
   public static final String DEFAULT_YARN_WORKFLOW_ID_TAG_PREFIX =
       "workflowid:";
 
+  /** Enabled trigger log-dir deletion by size for NonAggregatingLogHandler. */
+  public static final String NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED = NM_PREFIX +
+      "log.trigger.delete.by-size.enabled";
+  public static final boolean DEFAULT_NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED = false;
+
+  /** Trigger log-dir deletion when size bigger than yarn.nodemanager.log.delete.threshold.mb.
+   *  Depends on yarn.nodemanager.log.trigger.delete.by-size.enabled = true. */
+  public static final String NM_LOG_DELETE_THRESHOLD_MB = NM_PREFIX + "log.delete.threshold.mb";
+  public static final long DEFAULE_NM_LOG_DELETE_THRESHOLD_MB = 100 * 1024;

Review Comment:
   typo: DEFAULE -> DEFAULT





> trigger deletion of log-dir by size for NonAggregatingLogHandler
> ----------------------------------------------------------------
>
>                 Key: YARN-11277
>                 URL: https://issues.apache.org/jira/browse/YARN-11277
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 3.4.0
>            Reporter: Xianming Lei
>            Priority: Minor
>              Labels: pull-request-available
>
> In our yarn cluster, the log files of some containers are too large, which causes the NodeManager to frequently switch to the unhealthy state. For logs that are too large, we can consider deleting them directly without delaying yarn.nodemanager.log.retain-seconds.
> Cluster environment:
>  # 8k nodes+
>  # 50w+ apps  / day
> Configuration:
>  # yarn.nodemanager.log.retain-seconds=3days
>  # yarn.log-aggregation-enable=false
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org