You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:18:19 UTC

[jira] [Resolved] (SPARK-20658) spark.yarn.am.attemptFailuresValidityInterval doesn't seem to have an effect

     [ https://issues.apache.org/jira/browse/SPARK-20658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-20658.
----------------------------------
    Resolution: Incomplete

> spark.yarn.am.attemptFailuresValidityInterval doesn't seem to have an effect
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-20658
>                 URL: https://issues.apache.org/jira/browse/SPARK-20658
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.1.0
>            Reporter: Paul Jones
>            Priority: Minor
>              Labels: bulk-closed
>
> I'm running a job in YARN cluster mode using `spark.yarn.am.attemptFailuresValidityInterval=1h` specified in both spark-default.conf and in my spark-submit command. (This flag shows up in the environment tab of spark history server, so it seems that it's specified correctly). 
> However, I just had a job die with with four AM failures (three of the four failures were over an hour apart). So, I'm confused as to what could be going on. I haven't figured out the cause of the individual failures, so is it possible that we always count certain types of failures? E.g. jobs that are killed due to memory issues always count? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org