You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by witgo <gi...@git.apache.org> on 2014/09/07 14:58:56 UTC

[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

GitHub user witgo opened a pull request:

    https://github.com/apache/spark/pull/2311

    [SPARK-3293] yarn's web show "SUCCEEDED" when the driver throw a exception in yarn-client

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/witgo/spark SPARK-3293

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2311.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2311
    
----
commit 3828707a9b67f0b54a8f6a0a9b36307c2ae14429
Author: GuoQiang Li <wi...@qq.com>
Date:   2014-09-03T06:00:52Z

    yarn check exit code

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17946079
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
             if (sc != null) {
               logInfo("Invoking sc stop from shutdown hook")
               sc.stop()
    -          finish(FinalApplicationStatus.SUCCEEDED)
    +        }
    +
    +        // Shuts down the AM.
    +        if (!finished) {
    --- End diff --
    
    Well my feeling is that if the driver program crashed, it's going to crash again the next time you try. Is that not the case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56308708
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20631/consoleFull) for   PR 2311 at commit [`7efcca0`](https://github.com/apache/spark/commit/7efcca0297802c6cfe685b61ad5c79f639cc5b0d).
     * This patch **does not** merge cleanly!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-54819294
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19979/consoleFull) for   PR 2311 at commit [`2ebe62e`](https://github.com/apache/spark/commit/2ebe62edd3056f842e16643c50cd77b04a6cbce5).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17951431
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
             if (sc != null) {
               logInfo("Invoking sc stop from shutdown hook")
               sc.stop()
    -          finish(FinalApplicationStatus.SUCCEEDED)
    +        }
    +
    +        // Shuts down the AM.
    +        if (!finished) {
    --- End diff --
    
    Yeah, unless we want to add more logic about trying to figure out what shouldn't be retried, the easy thing is to just retry on any failure.  Obviously if the machine dies this code won't be running, its more that something weird happens causing it to crash or exit badly.  
    
    There are actually some potential issues with rerunning the AM though.  One is what we refer to as split brain (one AM losing connection from RM but still running so it starts a second AM) and both write to the same output dir and cause issues with the output data.  I filed a jira for this to try to handle in Spark AM.
    
    The second occurs if the fist run had committed its output and we rerun it when shouldn't. 
     The reason we don't want that to happen is to prevent data corruption. Many times in MR one job will start once anothers output is committed, so if it was to get changed out from under them by a rerun of the AM it could lose data.  I'm not sure that same kind of check is as easy with Spark.  
    
    MR handles both of those cases. Obviously if your MR job is writing to some other service or using custom fileoutput or has some other side effects its up to the user to guarantee that it can be rerun.  
    
    I'm assuming its the users responsibility with Spark since spark can rerun tasks/stages on failure. Any input on that Matei?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56408312
  
    I've been having a multiple different issues with the success/failure reporting (SPARK-3627).  Does this handle all of those?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17864288
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -119,13 +126,7 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
         } else {
           runExecutorLauncher(securityMgr)
         }
    -
    -    if (finalStatus != FinalApplicationStatus.UNDEFINED) {
    -      finish(finalStatus)
    -      0
    -    } else {
    -      1
    -    }
    +    exitCode
    --- End diff --
    
    Nevermind. I see you added the call to the shutdown hook. Still, the `finalStatus` needs to be properly set in `runExecutorLauncher`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56914209
  
    Yes, this does not involve `org.apache.spark.deploy.yarn.Client` class, which run outside the cluster.
    We should call `YarnClient.killApplication` when an uncaught exception is thrown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56524702
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20706/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56910981
  
    @witgo There are cases not covered by this pr that I would like to fix up dealing with exit codes and final status.  Would you mind if I just pull these changes into mine?  I want to try to fix it up across the board.  I hope to have something up tomorrow if all the testing goes well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-54747134
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19948/consoleFull) for   PR 2311 at commit [`3828707`](https://github.com/apache/spark/commit/3828707a9b67f0b54a8f6a0a9b36307c2ae14429).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17251254
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -369,7 +378,17 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
               // Some apps have "System.exit(0)" at the end.  The user thread will stop here unless
               // it has an uncaught exception thrown out.  It needs a shutdown hook to set SUCCEEDED.
               status = FinalApplicationStatus.SUCCEEDED
    -        } finally {
    +        } catch {
    +          case NoExitsException(exitCode) =>
    +            exitStatus = exitCode
    +            if (exitStatus == 0) {
    +              status = FinalApplicationStatus.SUCCEEDED
    +            }
    +          case e: Throwable =>
    --- End diff --
    
    Better not to catch `Throwable`, and don't set `exitCode` to -1 (in the shell, that translates to an exit code of 255, which is generally interpreted to mean different things than just "process exited" [1].)
    
    [1] Some discussion: http://en.wikipedia.org/wiki/Exit_status



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56410208
  
    @tgravescs this change should handle uncaught exceptions and explicit `System.exit` in the driver code. Not sure if that covers all the issues you're seeing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56537068
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20707/consoleFull) for   PR 2311 at commit [`617efef`](https://github.com/apache/spark/commit/617efeff1a6bec059fd24652fd783f54dbb9915c).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56525297
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17948054
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
             if (sc != null) {
               logInfo("Invoking sc stop from shutdown hook")
               sc.stop()
    -          finish(FinalApplicationStatus.SUCCEEDED)
    +        }
    +
    +        // Shuts down the AM.
    +        if (!finished) {
    --- End diff --
    
    @vanzin But this code is *in* the AM, isn't it? That's what I'm saying above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-54812961
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56537077
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20707/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17951694
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
             if (sc != null) {
               logInfo("Invoking sc stop from shutdown hook")
               sc.stop()
    -          finish(FinalApplicationStatus.SUCCEEDED)
    +        }
    +
    +        // Shuts down the AM.
    +        if (!finished) {
    --- End diff --
    
    Our `saveAs*File` operations in Spark also avoid running if the output directly already exists, so there is that support in there. But for Spark I'd make this AM rerun configurable. The reason is that some Spark apps might be accessing weird data sources, serving a UI to outside users, or just doing fairly complicate logic inside that is hard to retry. These are things that MR jobs don't do as commonly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17868237
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -29,6 +29,7 @@ import org.apache.hadoop.yarn.api._
     import org.apache.hadoop.yarn.api.records._
     import org.apache.hadoop.yarn.conf.YarnConfiguration
     
    +import org.apache.spark.executor.{ExecutorUncaughtExceptionHandler, ExecutorExitCode}
    --- End diff --
    
    I don't see these used anywhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17863114
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -119,13 +126,7 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
         } else {
           runExecutorLauncher(securityMgr)
         }
    -
    -    if (finalStatus != FinalApplicationStatus.UNDEFINED) {
    --- End diff --
    
    Since you're changing this logic, you need to modify `runExecutorLauncher` so that it correctly reports an error on failure. Right now it will always exit with 0 and will not report the final status to the RM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17882325
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
             if (sc != null) {
               logInfo("Invoking sc stop from shutdown hook")
               sc.stop()
    -          finish(FinalApplicationStatus.SUCCEEDED)
    +        }
    +
    +        // Shuts down the AM.
    +        if (!finished) {
    --- End diff --
    
    I think this could cause problems with YARN retry logic.  We shouldn't be finishing (unregistering with RM) unless we've explicitly succeeded or failed and don't want a retry to happen in those weird cases.  Since this is in shutdown hook I'm not sure we explicitly know that. 
    
    i guess it comes down to us defining when the spark app is cleanly exiting.  This could be failed, killed, or succeeded.  Unfortunately I'm not sure spark really has this final reporting. 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56308622
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-54777212
  
    You will have to rebase your patch for tests to pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56912131
  
    also maybe I'm missing something but I don't see how the changes help in yarn-client mode? The changes you made were on the application master running on a separate host then the driver itself, startUserClass is only used in cluster mode, and when the disconnect happens in client mode we always report success.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-54802000
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19974/consoleFull) for   PR 2311 at commit [`2ebe62e`](https://github.com/apache/spark/commit/2ebe62edd3056f842e16643c50cd77b04a6cbce5).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-54980572
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20042/consoleFull) for   PR 2311 at commit [`7efcca0`](https://github.com/apache/spark/commit/7efcca0297802c6cfe685b61ad5c79f639cc5b0d).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56352566
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20653/consoleFull) for   PR 2311 at commit [`5f0883d`](https://github.com/apache/spark/commit/5f0883d8ae00fe632ff237965fb5bc73c026c7c2).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-54806033
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19974/consoleFull) for   PR 2311 at commit [`2ebe62e`](https://github.com/apache/spark/commit/2ebe62edd3056f842e16643c50cd77b04a6cbce5).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    case class NoExitsException(exitCode: Int) extends SecurityException`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17946154
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
             if (sc != null) {
               logInfo("Invoking sc stop from shutdown hook")
               sc.stop()
    -          finish(FinalApplicationStatus.SUCCEEDED)
    +        }
    +
    +        // Shuts down the AM.
    +        if (!finished) {
    --- End diff --
    
    Basically there are a few cases:
    - If the machine with the AM dies, then sure, YARN could relaunch it (hopefully that is user-configurable in case their code has weird side effects)
    - If the driver returns 0, it's finished
    - If the driver crashes for another reason, it could be a bug in the driver; I'm not sure we should just resubmit in that case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17863181
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -119,13 +126,7 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
         } else {
           runExecutorLauncher(securityMgr)
         }
    -
    -    if (finalStatus != FinalApplicationStatus.UNDEFINED) {
    -      finish(finalStatus)
    -      0
    -    } else {
    -      1
    -    }
    +    exitCode
    --- End diff --
    
    Feels like with your changes there's a possibility that `finish()` might never be called. Might be better to add a call to that method here before returning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17946116
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
             if (sc != null) {
               logInfo("Invoking sc stop from shutdown hook")
               sc.stop()
    -          finish(FinalApplicationStatus.SUCCEEDED)
    +        }
    +
    +        // Shuts down the AM.
    +        if (!finished) {
    --- End diff --
    
    Well, that depends a lot on the crash. If the host where the AM was running just dies, running it on a different host might succeed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56829268
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20808/consoleFull) for   PR 2311 at commit [`9655aa8`](https://github.com/apache/spark/commit/9655aa8c8e99f49e4598d5f37548afeb6f1d9d6a).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56311778
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20631/consoleFull) for   PR 2311 at commit [`7efcca0`](https://github.com/apache/spark/commit/7efcca0297802c6cfe685b61ad5c79f639cc5b0d).
     * This patch **fails** unit tests.
     * This patch **does not** merge cleanly!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56358732
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20653/consoleFull) for   PR 2311 at commit [`5f0883d`](https://github.com/apache/spark/commit/5f0883d8ae00fe632ff237965fb5bc73c026c7c2).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17930410
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
             if (sc != null) {
               logInfo("Invoking sc stop from shutdown hook")
               sc.stop()
    -          finish(FinalApplicationStatus.SUCCEEDED)
    +        }
    +
    +        // Shuts down the AM.
    +        if (!finished) {
    --- End diff --
    
    Just to understand, is the YARN retry logic there for resubmitting the AM? Most Spark applications don't support that, since their driver contains a lot of state. (Though I guess applications submitted in yarn-client mode might).
    
    In general, if the user's driver is running within the AM, I would say it's successful if the driver's main() returns 0 or calls System.exit(0). If the driver is running remotely, it should tell the AM when to shut down cleanly when it closes its SparkContext or exits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-54748722
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19948/consoleFull) for   PR 2311 at commit [`3828707`](https://github.com/apache/spark/commit/3828707a9b67f0b54a8f6a0a9b36307c2ae14429).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    case class NoExitsException(exitCode: Int) extends SecurityException`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56829283
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20808/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56525799
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20707/consoleFull) for   PR 2311 at commit [`617efef`](https://github.com/apache/spark/commit/617efeff1a6bec059fd24652fd783f54dbb9915c).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-54990398
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20042/consoleFull) for   PR 2311 at commit [`7efcca0`](https://github.com/apache/spark/commit/7efcca0297802c6cfe685b61ad5c79f639cc5b0d).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56912003
  
    Ok, no problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-56821516
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20808/consoleFull) for   PR 2311 at commit [`9655aa8`](https://github.com/apache/spark/commit/9655aa8c8e99f49e4598d5f37548afeb6f1d9d6a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by witgo <gi...@git.apache.org>.
Github user witgo closed the pull request at:

    https://github.com/apache/spark/pull/2311


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2311#issuecomment-54830677
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19979/consoleFull) for   PR 2311 at commit [`2ebe62e`](https://github.com/apache/spark/commit/2ebe62edd3056f842e16643c50cd77b04a6cbce5).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    case class NoExitsException(exitCode: Int) extends SecurityException`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17251037
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -355,11 +355,20 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
       private def startUserClass(): Thread = {
         logInfo("Starting the user JAR in a separate Thread")
         System.setProperty("spark.executor.instances", args.numExecutors.toString)
    +    var stopped = false
    +    case class NoExitsException(exitCode: Int) extends SecurityException
         val mainMethod = Class.forName(args.userClass, false,
           Thread.currentThread.getContextClassLoader).getMethod("main", classOf[Array[String]])
     
         val t = new Thread {
           override def run() {
    +        System.setSecurityManager(new java.lang.SecurityManager() {
    +          override def checkExit(paramInt: Int) {
    +            if (!stopped) {
    +              throw new NoExitsException(paramInt)
    --- End diff --
    
    Instead of throwing the exception, why not just set `exitStatus` directly and avoid having a custom exception? You can also use that as a trigger to call "finish()" if it hasn't been called yet, with a nice error message about the app having called `System.exit`.
    
    That avoids messing with the app's exception handling, since very few people realize `System.exit` can throw and thus don't really plan for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17884602
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
             if (sc != null) {
               logInfo("Invoking sc stop from shutdown hook")
               sc.stop()
    -          finish(FinalApplicationStatus.SUCCEEDED)
    +        }
    +
    +        // Shuts down the AM.
    +        if (!finished) {
    --- End diff --
    
    @mateiz @pwendell  is there anything in spark that we can/should use  where we consider the application in a final state (either success or failure) such that we wouldn't want to retry it?    On MR there is explicit states for finishing and it also has checks for the committed output.   Spark I don't think is quite as straight forward.  Do we have any guidance on what an application should do on error and success?
    
    ie can we say if sc.stop() is called then it finished cleanly, or perhaps if System.exit(0) is called. Should we be peaking at the metrics or something to see if any jobs failed with the application. thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3293] yarn's web show "SUCCEEDED" when ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2311#discussion_r17940217
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
             if (sc != null) {
               logInfo("Invoking sc stop from shutdown hook")
               sc.stop()
    -          finish(FinalApplicationStatus.SUCCEEDED)
    +        }
    +
    +        // Shuts down the AM.
    +        if (!finished) {
    --- End diff --
    
    Yes the retry logic is for resubmitting the AM in the case it was on machine that went down, slow, network loss, etc..
    
    Ok so we'll go with if it System.exit(0) or returns 0 then it succeeded and everything else for now is failure and will retry if configured to retry.  If we don't want it to retry in certain cases we would need more information from the driver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org