You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/07/28 22:30:04 UTC
[jira] [Commented] (SPARK-9416) Yarn logs say that Spark Python job has succeeded even though job has failed in Yarn cluster mode

    [ https://issues.apache.org/jira/browse/SPARK-9416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644996#comment-14644996 ] 

Sean Owen commented on SPARK-9416:
----------------------------------

Looks like a duplicate of SPARK-7736

> Yarn logs say that Spark Python job has succeeded even though job has failed in Yarn cluster mode
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-9416
>                 URL: https://issues.apache.org/jira/browse/SPARK-9416
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.4.1
>         Environment: 3.13.0-53-generic #89-Ubuntu SMP Wed May 20 10:34:39 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Elkhan Dadashov
>
> While running Spark Word count python example with intentional mistake in Yarn cluster mode, Spark terminal logs (Yarn logs) states final status as SUCCEEDED, but log files for Spark application state correct results indicating that the job failed.
> Terminal log output & application log output contradict each other.
> If i run same job on local mode then terminal logs and application logs match, where both state that job has failed to expected error in python script.
> More details: Scenario
> While running Spark Word count python example on Yarn cluster mode, if I make intentional error in wordcount.py by changing this line (I'm using Spark 1.4.1, but this problem exists in Spark 1.4.0 and in 1.3.0 versions - which i tested):
> lines = sc.textFile(sys.argv[1], 1)
> into this line:
> lines = sc.textFile(nonExistentVariable,1)
> where nonExistentVariable variable was never created and initialized.
> then i run that example with this command (I put README.md into HDFS before running this command): 
> ./bin/spark-submit --master yarn-cluster wordcount.py /README.md
> The job runs and finishes successfully according the log printed in the terminal :
> Terminal logs:
> ...
> 15/07/23 16:19:17 INFO yarn.Client: Application report for application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:18 INFO yarn.Client: Application report for application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:19 INFO yarn.Client: Application report for application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:20 INFO yarn.Client: Application report for application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:21 INFO yarn.Client: Application report for application_1437612288327_0013 (state: FINISHED)
> 15/07/23 16:19:21 INFO yarn.Client: 
> 	 client token: N/A
> 	 diagnostics: Shutdown hook called before final status was reported.
> 	 ApplicationMaster host: 10.0.53.59
> 	 ApplicationMaster RPC port: 0
> 	 queue: default
> 	 start time: 1437693551439
> 	 final status: SUCCEEDED
> 	 tracking URL: http://localhost:8088/proxy/application_1437612288327_0013/history/application_1437612288327_0013/1
> 	 user: edadashov
> 15/07/23 16:19:21 INFO util.Utils: Shutdown hook called
> 15/07/23 16:19:21 INFO util.Utils: Deleting directory /tmp/spark-eba0a1b5-a216-4afa-9c54-a3cb67b16444
> But if look at log files generated for this application in HDFS - it indicates failure of the job with correct reason:
> Application log files:
> ...
> \00 stdout\00 179Traceback (most recent call last):
>   File "wordcount.py", line 32, in <module>
>     lines = sc.textFile(nonExistentVariable,1)
> NameError: name 'nonExistentVariable' is not defined
> (Yarn logs to) Terminal output - final status: SUCCEEDED , is not matching application log results - failure of the job (NameError: name 'nonExistentVariable' is not defined) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org