You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by piaozhexiu <gi...@git.apache.org> on 2015/04/03 08:37:29 UTC

[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

GitHub user piaozhexiu opened a pull request:

    https://github.com/apache/spark/pull/5343

    [SPARK-6692][YARN] Make it possible to kill AM in YARN cluster mode

    I understand that the yarn-cluster mode is designed for fire-and-forget model; therefore, terminating the yarn client doesn't kill AM.
    
    However, it is very common that users submit Spark jobs via job scheduler (e.g. Apache Oozie) or remote job server (e.g. Netflix Genie) where it is expected that killing the yarn client will terminate AM.
    It is true that the yarn-client mode can be used in such cases. But then, the yarn client sometimes needs lots of heap memory for big jobs if it runs in the yarn-client mode. In fact, the yarn-cluster mode is ideal for big jobs because AM can be given arbitrary heap memory unlike the yarn client. So it would be very useful to make it possible to kill AM even in the yarn-cluster mode.
    
    In addition, Spark jobs often become zombie jobs if users ctrl-c them as soon as they're accepted (but not yet running). Although they're eventually shutdown after AM timeout, it would be nice if AM could immediately get killed in such cases too.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/piaozhexiu/spark SPARK-6692

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5343.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5343
    
----
commit ef2793c497d7b8f3baf03a47cbbdf6845e31f05e
Author: Cheolsoo Park <ch...@netflix.com>
Date:   2015-04-03T06:35:24Z

    Make it possible to kill AM in YARN cluster mode when the client is
    terminated

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-89337784
  
    I'm somewhat dubious about whether this is needed.  It takes a single command to kill a job with yarn application -kill.  Also, the advantage of the latter is that it should return a non-zero exit code if something goes wrong, but there's no automated way to know that a kill went through when the client is killed.  Also, for comparison, there's no similar option for killing a MapReduce job.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-89343218
  
    I see.  Do Hive/Pig/Sqoop have active clients or are they just monitoring progress in the same way the Spark yarn-cluster client is?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by piaozhexiu <gi...@git.apache.org>.
Github user piaozhexiu commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-89342280
  
    Hi @sryza , thanks for the comment. You're right that it only takes a single command to kill AM. But in reality, sending that single command from job server isn't always that simple.
    
    For eg, in Genie, all it does to launch a job is spawning a shell process and executing the spark-submit command. So what application id the job was given isn't passed back to the launcher. Of course, you can argue that Genie needs to be rewritten, and we're going to. But this model has worked well so far across all the Hadoop tools such as Hive, Pig, and Sqoop. So I imagine that many people would run into the same problem as with me.
    
    Another point is that this behavior causes confusion to users. Since they're used to old Hive/Pig/Sqoop behaviors, they expect that ctrl-c'ing their commands kill their jobs. As a platform operator, I'd like to keep the behavior of all different tools as consistent as possible so that confusion can be avoided.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-89190287
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

Posted by piaozhexiu <gi...@git.apache.org>.
Github user piaozhexiu commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-90205393
  
    I see. Thanks for the clarification.
    
    That's a valid point. You're right that I can't 100% guarantee whether the kill was executed or not. But as long as job flows are designed to be idempotent, that shouldn't really matter. Production jobs that run in cloud like ours follow the best practice anyway.
    
    The main motivation for me to add this option was to free up cluster resources as much as possible on a best-effort basis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-89190428
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-90188806
  
    I'm a bit on the fence about this one as well.  Relying on the fact the client goes away kills the job when its running on the cluster seems unreliable to me.  For instance lets say client loses network connection to yarn temporarily.  User thinks its killed when it really isn't.  I'm sure the common case it would work just fine though. But it seems like it would be better to ask for status of the app and kill it via yarn.  
    On the other hand this would probably help things like oozie job -kill work. 
    
    Also can you perhaps rename the title on this as this is really adding option for client to kill AM when its killed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

Posted by piaozhexiu <gi...@git.apache.org>.
Github user piaozhexiu closed the pull request at:

    https://github.com/apache/spark/pull/5343


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-93210723
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30302/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5343#discussion_r27724310
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -105,6 +102,16 @@ private[spark] class Client(
         // Finally, submit and monitor the application
         logInfo(s"Submitting application ${appId.getId} to ResourceManager")
         yarnClient.submitApplication(appContext)
    +
    +    // In YARN Cluster mode, AM is not killed when the client is terminated by default.
    +    // But if spark.yarn.force.shutdown.am is set true, AM is force shutdown.
    +    if (isClusterMode && sparkConf.getBoolean("spark.yarn.am.force.shutdown", false)) {
    +      val shutdownHook = new Runnable {
    +        override def run() { yarnClient.killApplication(appId) }
    +      }
    +      ShutdownHookManager.get().addShutdownHook(shutdownHook, 0)
    --- End diff --
    
    It seems hacky to put all this cleanup logic into shutdown hooks; can't this be in `stop()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by piaozhexiu <gi...@git.apache.org>.
Github user piaozhexiu commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-89345472
  
    Hive/Pig/Sqoop clients have active clients in a sense that they manage jobs DAG (i.e. dependency of MR jobs) and adjust parallelisms between stages. But since their DAG scheduling is far simpler than Spark DAG scheduler, they're lightweight.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

Posted by piaozhexiu <gi...@git.apache.org>.
Github user piaozhexiu commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-93179692
  
    Can someone please trigger the jenkins build for this PR? It failed for a flaky test before and didn't rerun after that. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

Posted by piaozhexiu <gi...@git.apache.org>.
Github user piaozhexiu commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-90193027
  
    Hi @tgravescs , thank you for your comment.
    
    Changed the title as you suggested.
    
    > Relying on the fact the client goes away kills the job when its running on the cluster seems unreliable to me.
    
    Since I am only killing AM inside the shutdown hook, this shouldn't be the case, no? When network connection between client and AM is lost, it won't kill AM as long as client continues to run.
    
    > But it seems like it would be better to ask for status of the app and kill it via yarn.
    
    I also added a check for the status of AM prior to killing it. Will that address your concern?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by piaozhexiu <gi...@git.apache.org>.
Github user piaozhexiu commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-89424359
  
    The jenkins failure is due to [SPARK-6701](https://issues.apache.org/jira/browse/SPARK-6701), which is unrelated to this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-89207803
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29654/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-93180360
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-93210645
  
      [Test build #30302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30302/consoleFull) for   PR 5343 at commit [`2a3fa38`](https://github.com/apache/spark/commit/2a3fa381708ce5319ca3786a079c866b70467e81).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch **adds the following new dependencies:**
       * `commons-math3-3.4.1.jar`
       * `snappy-java-1.1.1.7.jar`
    
     * This patch **removes the following dependencies:**
       * `commons-math3-3.1.1.jar`
       * `snappy-java-1.1.1.6.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-90197726
  
    To clarify, I meant if the client happens to die or get killed when the network is down or the RM is down, or the gateway box goes down, then the application might not be killed.  If whatever other service really wants to knows its killed its more reliable to get information directly from the RM.    But again that all depends on how that service implements its checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-89193013
  
      [Test build #29654 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29654/consoleFull) for   PR 5343 at commit [`ef2793c`](https://github.com/apache/spark/commit/ef2793c497d7b8f3baf03a47cbbdf6845e31f05e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-107672875
  
    @piaozhexiu Echoing others' comments, I don't think we should tie the fate of the submission client to that of the AM. If fire-and-forget mode is enabled (i.e. `spark.yarn.submit.waitAppCompletion` is false), then this option kills your application immediately without allowing it to run. IMO it makes more sense for the AM to exit when the SparkContext exits as it currently does today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-93181658
  
      [Test build #30302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30302/consoleFull) for   PR 5343 at commit [`2a3fa38`](https://github.com/apache/spark/commit/2a3fa381708ce5319ca3786a079c866b70467e81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-89207796
  
      [Test build #29654 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29654/consoleFull) for   PR 5343 at commit [`ef2793c`](https://github.com/apache/spark/commit/ef2793c497d7b8f3baf03a47cbbdf6845e31f05e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

Posted by piaozhexiu <gi...@git.apache.org>.
Github user piaozhexiu commented on the pull request:

    https://github.com/apache/spark/pull/5343#issuecomment-107677625
  
    @andrewor14 Thanks for the comment. Why don't I close my pr and jira since it doesn't look like being committed? I will carry this modification in my own internal release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

Posted by piaozhexiu <gi...@git.apache.org>.
Github user piaozhexiu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5343#discussion_r27756532
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -105,6 +102,16 @@ private[spark] class Client(
         // Finally, submit and monitor the application
         logInfo(s"Submitting application ${appId.getId} to ResourceManager")
         yarnClient.submitApplication(appContext)
    +
    +    // In YARN Cluster mode, AM is not killed when the client is terminated by default.
    +    // But if spark.yarn.force.shutdown.am is set true, AM is force shutdown.
    +    if (isClusterMode && sparkConf.getBoolean("spark.yarn.am.force.shutdown", false)) {
    +      val shutdownHook = new Runnable {
    +        override def run() { yarnClient.killApplication(appId) }
    +      }
    +      ShutdownHookManager.get().addShutdownHook(shutdownHook, 0)
    --- End diff --
    
    Hi, @srowen , thank you for the question.
    
    Unfortunately, it won't work. `Client.stop()` is invoked in the client mode by Yarn**Client**SchedulerBackend.stop() while it is not in the cluster mode. Furthermore, I can't call it from Yarn**Cluster**SchedulerBackend.stop() because Yarn**Cluster**SchedulerBackend runs in AM not in the client.
    
    In addition, to handle all possible interruption points, shutdown hook seems like the most effective way. Imagine that the client can be killed at any time among [a], [b], and [c]:
    
    1. Client starts.
       * [a] <---- Kill here
    2. Client submits the application.
       * [b] <---- Kill here / AM is ACCEPTED
    3. Client waits for the application to be accepted.
       * [c] <---- Kill here / AM is RUNNING
    4. Client waits for the application to be finished.
    
    If the client is killed at [b], shutdown hook is the best way to catch it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org