You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viper-kun <gi...@git.apache.org> on 2015/04/15 11:41:32 UTC

[GitHub] spark pull request: [Spark-6924]Fix client hands in yarn-client mo...

GitHub user viper-kun opened a pull request:

    https://github.com/apache/spark/pull/5523

    [Spark-6924]Fix client hands in yarn-client mode when net is broken

    https://issues.apache.org/jira/browse/SPARK-6924

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viper-kun/spark spark-6924

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5523
    
----
commit 9288f220a6b7e4ea0d938a2ec4fbfb201de1aa71
Author: xukun 00228947 <xu...@huawei.com>
Date:   2015-04-15T09:40:28Z

    fix client hands in yarn-client mode

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5523#discussion_r28494320
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -75,6 +75,8 @@ import org.apache.spark.util._
      */
     class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationClient {
     
    +  var isInited: Boolean = false
    --- End diff --
    
    This is going to conflict with the overhaul of the SparkContext constructor. I don't see why this depends on the constructor finishing since where you reference the SparkContext, it has been constructed. I also don't think that a lack of executor messages indicates a disconnection; it's not possible to distinguish from temporary loss of connectivity this way. I think you'd have to explain this a lot more (with tests) or close this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

Posted by viper-kun <gi...@git.apache.org>.
Github user viper-kun commented on the pull request:

    https://github.com/apache/spark/pull/5523#issuecomment-95764614
  
    ok.  I will close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

Posted by viper-kun <gi...@git.apache.org>.
Github user viper-kun commented on the pull request:

    https://github.com/apache/spark/pull/5523#issuecomment-93615543
  
    @srowen  I updated the jira. Pls review it. Thanks。


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6924]Fix client hands in yarn-client mo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5523#issuecomment-93286548
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5523#issuecomment-95577801
  
    I think https://github.com/apache/spark/pull/5663 stands a better chance of addressing the issue and being merged. Do you mind commenting on that one, and closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

Posted by viper-kun <gi...@git.apache.org>.
Github user viper-kun commented on the pull request:

    https://github.com/apache/spark/pull/5523#issuecomment-93728497
  
    Between construction, it is normal that it didn't hear from executors. Only after construction, executors have connected and sent heartbeat to driver. We can indicates whether there is a disconnection. SparkContext.isInited show whether SparkContext construction has completed .
    
    >>>I don't think you can call stop() just because you didn't hear from executors recently.
     Is there any better way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

Posted by viper-kun <gi...@git.apache.org>.
Github user viper-kun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5523#discussion_r28501942
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -75,6 +75,8 @@ import org.apache.spark.util._
      */
     class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationClient {
     
    +  var isInited: Boolean = false
    --- End diff --
    
    >>>This is going to conflict with the overhaul of the SparkContext constructor. I don't see why this depends on the constructor finishing since where you reference the SparkContext, it has been constructed.
    
    In SparkContext constructor, when HeartbeatReceiver create,  timeoutCheckingThread will check expire dead host. if executorLastSeen is empty, it will execute sc.stop(). Then it will throw exception:
    java.lang.NullPointerException
            at org.apache.spark.SparkContext.stop(SparkContext.scala:1416)
            at org.apache.spark.HeartbeatReceiver.org$apache$spark$HeartbeatReceiver$$expireDeadHosts(HeartbeatReceiver.scala:134)
            at org.apache.spark.HeartbeatReceiver$$anonfun$receive$1.applyOrElse(HeartbeatReceiver.scala:92)
            at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:176)
            at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:125)
            at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:196)
            at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:124)
            at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
            at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
            at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
            at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
            at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
            at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
            at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
            at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
            at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:91)
            at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
            at akka.actor.ActorCell.invoke(ActorCell.scala:487)
            at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
            at akka.dispatch.Mailbox.run(Mailbox.scala:220)
            at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
            at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
            at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
            at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
            at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    15/04/16 15:12:16 INFO netty.NettyBlockTransferService: Server created on 53493



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6924]Fix client hands in yarn-client mo...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5523#issuecomment-93288547
  
    @viper-kun I closed the JIRA to make a point since it did not describe a problem. Do you mean "hangs"? still, I think you need to elaborate somewhere what the nature of the problem is and why this is a resolution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5523#issuecomment-93716364
  
    Have a look at https://github.com/apache/spark/commit/de4fa6b6d12e2bee0307ffba2abfca0c33f15e45 which may resolve the issue of bad state after construction.
    
    I don't think it's correct to make callers check some status of `SparkContext` to decide whether calling a method is safe. `stop()` should handle this. I don't think you can call `stop()` just because you didn't hear from executors recently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

Posted by viper-kun <gi...@git.apache.org>.
Github user viper-kun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5523#discussion_r28502410
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -75,6 +75,8 @@ import org.apache.spark.util._
      */
     class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationClient {
     
    +  var isInited: Boolean = false
    --- End diff --
    
    >>> I also don't think that a lack of executor messages indicates a disconnection; it's not possible to distinguish from temporary loss of connectivity this way.
    
    All executors send heartbeat to driver at fixed Rate.Over a period of time, all executors are expire,  I think there is a disconnection. Is there any better way to distinguish from temporary loss of connectivity.
    Can we check some times? If all executors still expire, we  indicates a disconnection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

Posted by viper-kun <gi...@git.apache.org>.
Github user viper-kun closed the pull request at:

    https://github.com/apache/spark/pull/5523


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org