You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Manuel Sopena Ballesteros <ma...@garvan.org.au> on 2019/10/22 01:51:41 UTC

need help diagnosing errors

Dear Haddop community,

I am trying to mature my Hadoop knowledge. In this case I am trying to make my spark submit job to fail due to OOM but I am not able to find the root cause in the logs.

This is the script I am running:

a = "bigword"
b = "bigword"
print(a)

for i in range(1000000000):
    a += b

with spark.driver.memory 3g

the job fails as expected but I can't find the real reason as I found the logs not clear enough

Attempt 1:
AM Container for appattempt_1570749574365_0050_000001 exited with exitCode: 11
Failing this attempt.Diagnostics: [2019-10-22 12:19:06.273]Exception from container-launch.
Container id: container_e15_1570749574365_0050_01_000001
Exit code: 11
Exception message: Launch container failed
Shell output: main : command provided 1
main : run as user is mansop
main : requested yarn user is mansop
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /d1/hadoop/yarn/local/nmPrivate/application_1570749574365_0050/container_e15_1570749574365_0050_01_000001/container_e15_1570749574365_0050_01_000001.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...
[2019-10-22 12:19:06.277]Container exited with a non-zero exit code 11. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/d1/hadoop/yarn/local/filecache/13/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[2019-10-22 12:19:06.278]Container exited with a non-zero exit code 11. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/d1/hadoop/yarn/local/filecache/13/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
For more detailed output, check the application tracking page: http://gl-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1570749574365_0050 Then click on links to logs of each attempt.

Attempt 2:
AM Container for appattempt_1570749574365_0050_000002 exited with exitCode: 13
Failing this attempt.Diagnostics: [2019-10-22 12:20:50.591]Exception from container-launch.
Container id: container_e15_1570749574365_0050_02_000001
Exit code: 13
Exception message: Launch container failed
Shell output: main : command provided 1
main : run as user is mansop
main : requested yarn user is mansop
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /d0/hadoop/yarn/local/nmPrivate/application_1570749574365_0050/container_e15_1570749574365_0050_02_000001/container_e15_1570749574365_0050_02_000001.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...
[2019-10-22 12:20:50.596]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/d0/hadoop/yarn/local/filecache/10/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[2019-10-22 12:20:50.598]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/d0/hadoop/yarn/local/filecache/10/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
For more detailed output, check the application tracking page: http://gl-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1570749574365_0050 Then click on links to logs of each attempt.

Could someone please help me to understand:

What exitCode: 13 and exitCode: 11 means?

How should I keep troubleshooting

Thank you very much

Manuel Sopena Ballesteros

Big Data Engineer | Kinghorn Centre for Clinical Genomics

 [cid:image001.png@01D4C835.ED3C2230] <https://www.garvan.org.au/>

a: 384 Victoria Street, Darlinghurst NSW 2010
p: +61 2 9355 5760  |  +61 4 12 123 123
e: manuel.sb@garvan.org.au<ma...@garvan.org.au>

Like us on Facebook<http://www.facebook.com/garvaninstitute> | Follow us on Twitter<http://twitter.com/GarvanInstitute> and LinkedIn<http://www.linkedin.com/company/garvan-institute-of-medical-research>

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.