You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by "huozhanfeng@gmail.com" <hu...@gmail.com> on 2014/07/03 06:21:49 UTC

How to limit MRJob's stdout/stderr size(yarn2.3)

Hi,friend:

    When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. 

    I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd 
as follows: 

exec /bin/bash -c "( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -     Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002 -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild $test_IP 53911 attempt_1403930653208_0003_m_000000_0 2 | tail -c 102 >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stdout ; exit $PIPESTATUS ) 2>&1 | tail -c 10240 >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stderr ; exit $PIPESTATUS " 


    But it doesn't take effect. 

    And then, when I use "export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y" for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line 450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line 161:List<String> newCmds = new ArrayList<String>(command.size())) the cmd will work. 

    I doubt there's concurrency problem caused pipe shell will not perform properly. It matters, and I need help. 

   @https://issues.apache.org/jira/browse/YARN-2231

thanks



Zhanfeng Huo

Re: How to limit MRJob's stdout/stderr size(yarn2.3)

Posted by Adam Kawa <ka...@gmail.com>.
There are a setting like

<property>
  <name>mapreduce.task.userlog.limit.kb</name>
  <value>0</value>
  <description>The maximum size of user-logs of each task in KB. 0 disables
the cap.
  </description>
</property>

but I have not tried it on YARN.

If your disks are full, because you run many application+tasks that produce
logs, you could also consider enabling log aggregation in HDFS. Truncating
logs has the disadvantage that you might lose important information that
could be useful for debugging or performance analysis (e.g. a limit can be
good for some jobs, but for some of them you might want to access a
complete log).


2014-07-03 6:21 GMT+02:00 huozhanfeng@gmail.com <hu...@gmail.com>:

> Hi,friend:
>
>     When a MRJob print too much stdout or stderr log, the disk will be
> filled. Now it has influence our platform management.
>
>     I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come
> from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
> as follows:
>
> exec /bin/bash -c "( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true
> -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -
> Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002
> -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA
> org.apache.hadoop.mapred.YarnChild $test_IP 53911
> attempt_1403930653208_0003_m_000000_0 2 | tail -c 102
> >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stdout
> ; exit $PIPESTATUS ) 2>&1 | tail -c 10240
> >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stderr
> ; exit $PIPESTATUS "
>
>
>     But it doesn't take effect.
>
>     And then, when I use "export YARN_NODEMANAGER_OPTS=-Xdebug
> -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y" for debuging
> NodeManager, I find when I set the BreakPoints at
> org.apache.hadoop.util.Shell(line 450:process = builder.start()) and
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
> 161:List<String> newCmds = new ArrayList<String>(command.size())) the cmd
> will work.
>
>     I doubt there's concurrency problem caused pipe shell will not perform
> properly. It matters, and I need help.
>
>    @https://issues.apache.org/jira/browse/YARN-2231
>
> thanks
>
> ------------------------------
> Zhanfeng Huo
>

Re: How to limit MRJob's stdout/stderr size(yarn2.3)

Posted by Adam Kawa <ka...@gmail.com>.
There are a setting like

<property>
  <name>mapreduce.task.userlog.limit.kb</name>
  <value>0</value>
  <description>The maximum size of user-logs of each task in KB. 0 disables
the cap.
  </description>
</property>

but I have not tried it on YARN.

If your disks are full, because you run many application+tasks that produce
logs, you could also consider enabling log aggregation in HDFS. Truncating
logs has the disadvantage that you might lose important information that
could be useful for debugging or performance analysis (e.g. a limit can be
good for some jobs, but for some of them you might want to access a
complete log).


2014-07-03 6:21 GMT+02:00 huozhanfeng@gmail.com <hu...@gmail.com>:

> Hi,friend:
>
>     When a MRJob print too much stdout or stderr log, the disk will be
> filled. Now it has influence our platform management.
>
>     I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come
> from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
> as follows:
>
> exec /bin/bash -c "( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true
> -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -
> Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002
> -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA
> org.apache.hadoop.mapred.YarnChild $test_IP 53911
> attempt_1403930653208_0003_m_000000_0 2 | tail -c 102
> >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stdout
> ; exit $PIPESTATUS ) 2>&1 | tail -c 10240
> >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stderr
> ; exit $PIPESTATUS "
>
>
>     But it doesn't take effect.
>
>     And then, when I use "export YARN_NODEMANAGER_OPTS=-Xdebug
> -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y" for debuging
> NodeManager, I find when I set the BreakPoints at
> org.apache.hadoop.util.Shell(line 450:process = builder.start()) and
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
> 161:List<String> newCmds = new ArrayList<String>(command.size())) the cmd
> will work.
>
>     I doubt there's concurrency problem caused pipe shell will not perform
> properly. It matters, and I need help.
>
>    @https://issues.apache.org/jira/browse/YARN-2231
>
> thanks
>
> ------------------------------
> Zhanfeng Huo
>

Re: How to limit MRJob's stdout/stderr size(yarn2.3)

Posted by Adam Kawa <ka...@gmail.com>.
There are a setting like

<property>
  <name>mapreduce.task.userlog.limit.kb</name>
  <value>0</value>
  <description>The maximum size of user-logs of each task in KB. 0 disables
the cap.
  </description>
</property>

but I have not tried it on YARN.

If your disks are full, because you run many application+tasks that produce
logs, you could also consider enabling log aggregation in HDFS. Truncating
logs has the disadvantage that you might lose important information that
could be useful for debugging or performance analysis (e.g. a limit can be
good for some jobs, but for some of them you might want to access a
complete log).


2014-07-03 6:21 GMT+02:00 huozhanfeng@gmail.com <hu...@gmail.com>:

> Hi,friend:
>
>     When a MRJob print too much stdout or stderr log, the disk will be
> filled. Now it has influence our platform management.
>
>     I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come
> from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
> as follows:
>
> exec /bin/bash -c "( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true
> -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -
> Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002
> -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA
> org.apache.hadoop.mapred.YarnChild $test_IP 53911
> attempt_1403930653208_0003_m_000000_0 2 | tail -c 102
> >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stdout
> ; exit $PIPESTATUS ) 2>&1 | tail -c 10240
> >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stderr
> ; exit $PIPESTATUS "
>
>
>     But it doesn't take effect.
>
>     And then, when I use "export YARN_NODEMANAGER_OPTS=-Xdebug
> -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y" for debuging
> NodeManager, I find when I set the BreakPoints at
> org.apache.hadoop.util.Shell(line 450:process = builder.start()) and
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
> 161:List<String> newCmds = new ArrayList<String>(command.size())) the cmd
> will work.
>
>     I doubt there's concurrency problem caused pipe shell will not perform
> properly. It matters, and I need help.
>
>    @https://issues.apache.org/jira/browse/YARN-2231
>
> thanks
>
> ------------------------------
> Zhanfeng Huo
>

Re: How to limit MRJob's stdout/stderr size(yarn2.3)

Posted by Adam Kawa <ka...@gmail.com>.
There are a setting like

<property>
  <name>mapreduce.task.userlog.limit.kb</name>
  <value>0</value>
  <description>The maximum size of user-logs of each task in KB. 0 disables
the cap.
  </description>
</property>

but I have not tried it on YARN.

If your disks are full, because you run many application+tasks that produce
logs, you could also consider enabling log aggregation in HDFS. Truncating
logs has the disadvantage that you might lose important information that
could be useful for debugging or performance analysis (e.g. a limit can be
good for some jobs, but for some of them you might want to access a
complete log).


2014-07-03 6:21 GMT+02:00 huozhanfeng@gmail.com <hu...@gmail.com>:

> Hi,friend:
>
>     When a MRJob print too much stdout or stderr log, the disk will be
> filled. Now it has influence our platform management.
>
>     I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come
> from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
> as follows:
>
> exec /bin/bash -c "( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true
> -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -
> Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002
> -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA
> org.apache.hadoop.mapred.YarnChild $test_IP 53911
> attempt_1403930653208_0003_m_000000_0 2 | tail -c 102
> >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stdout
> ; exit $PIPESTATUS ) 2>&1 | tail -c 10240
> >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stderr
> ; exit $PIPESTATUS "
>
>
>     But it doesn't take effect.
>
>     And then, when I use "export YARN_NODEMANAGER_OPTS=-Xdebug
> -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y" for debuging
> NodeManager, I find when I set the BreakPoints at
> org.apache.hadoop.util.Shell(line 450:process = builder.start()) and
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
> 161:List<String> newCmds = new ArrayList<String>(command.size())) the cmd
> will work.
>
>     I doubt there's concurrency problem caused pipe shell will not perform
> properly. It matters, and I need help.
>
>    @https://issues.apache.org/jira/browse/YARN-2231
>
> thanks
>
> ------------------------------
> Zhanfeng Huo
>