You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liyichao <gi...@git.apache.org> on 2017/05/24 09:19:13 UTC

[GitHub] spark pull request #18084: [SPARK-19900][core]Remove driver when relaunching...

GitHub user liyichao opened a pull request:

    https://github.com/apache/spark/pull/18084

    [SPARK-19900][core]Remove driver when relaunching.

    This is https://github.com/apache/spark/pull/17888 .
    
    cc @cloud-fan @jiangxb1987

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liyichao/spark SPARK-19900-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18084.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18084
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18084: [SPARK-19900][core]Remove driver when relaunching...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18084#discussion_r118424700
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala ---
    @@ -796,9 +796,12 @@ private[deploy] class Master(
       }
     
       private def relaunchDriver(driver: DriverInfo) {
    -    driver.worker = None
    -    driver.state = DriverState.RELAUNCHING
    -    waitingDrivers += driver
    +    removeDriver(driver.id, DriverState.RELAUNCHING, None)
    +    val newDriver = createDriver(driver.desc)
    --- End diff --
    
    First, we must distinguish the original driver and the newly relaunched one, because there will be statusUpdate of the two versions to arrive at master. For example, when the network partitioned worker reconnects to master, it will send `DriverStateChanged` with the driver id, and master must recognize it is the state of the original driver and the newly launched driver.
    
    The patch simply choose a new driver id to do this, which also has some Shortcomings, however. For example, In the UI, the two versions of driver are not related, and the final state is `RELAUNCHING`(which seems better to be relaunched).  
    
    Another way is to add some like `attemptId` to driver state, and then Let `DriverStateChanged` bring the attemptId to indicate its entity. This seems more complex.
    
    What's your opinion? 
    
    It seems hard something like `attemptId` to the persistent driver state?   Looking forward to your opinions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    **[Test build #77423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77423/testReport)** for PR 18084 at commit [`da0f977`](https://github.com/apache/spark/commit/da0f977f2846d7051102a32521a1704dca74ed12).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    I think your point is, in case a LaunchDriver message and a KillDriver message are send out simultaneously, there is a race condition that which message arrives to worker earlier is not determined. If the KillDriver message arrives later, then we finally get a finished driver instead of a running driver.
    So, I'm not against the idea that we should rise the driver id on relaunch now, because it is kind of a new epoch that help us bypass the nested race condition issue. But please be sure to add comment to illustrate the condition in detail and add test cases to cover this.
    Also cc @cloud-fan FYI


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18084: [SPARK-19900][core]Remove driver when relaunching...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18084#discussion_r118376642
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala ---
    @@ -796,9 +796,12 @@ private[deploy] class Master(
       }
     
       private def relaunchDriver(driver: DriverInfo) {
    -    driver.worker = None
    -    driver.state = DriverState.RELAUNCHING
    -    waitingDrivers += driver
    +    removeDriver(driver.id, DriverState.RELAUNCHING, None)
    +    val newDriver = createDriver(driver.desc)
    --- End diff --
    
    Do you have a good reason to remove and create the driver in this case? It looks like some kind of overkill compared to the old logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    **[Test build #77353 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77353/testReport)** for PR 18084 at commit [`6ab9a0f`](https://github.com/apache/spark/commit/6ab9a0f5a62e4ec3c9757f748ff5e278d4741d25).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class MockWorker(master: RpcEndpointRef, conf: SparkConf = new SparkConf) extends RpcEndpoint `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    OK, another scenario:
    
    * driver with driverId1 started on worker1
    * worker1 lost
    * master add driverId1 to waitingDrivers
    * worker1 reconnects and sends DriverStateChanged(driverId1), but the message delayed in the network, **and remove driverId1 from local state**
    * master starts driverId1 on worker1.
    * master receives the message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Hi, add a workerId may not work. For example, this scenario:
    
    * driver with driverId1 started on worker1
    * worker1 lost
    * master add driverId1 to waitingDrivers
    * worker1 reconnects and sends DriverStateChanged(driverId1), but the message delayed in the network.
    * master starts driverId1 on worker1.
    * master receives the message.
    
    Now, what master should do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18084: [SPARK-19900][core]Remove driver when relaunching...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18084#discussion_r121881538
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
    @@ -588,6 +633,70 @@ class MasterSuite extends SparkFunSuite
         }
       }
     
    +  test("SPARK-19900: there should be a corresponding driver for the app after relaunching driver") {
    +    val conf = new SparkConf().set("spark.worker.timeout", "1")
    +    val master = makeMaster(conf)
    +    master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
    +    eventually(timeout(10.seconds)) {
    +      val masterState = master.self.askSync[MasterStateResponse](RequestMasterState)
    --- End diff --
    
    shall we move this out of the `eventually {...}` block?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    **[Test build #77423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77423/testReport)** for PR 18084 at commit [`da0f977`](https://github.com/apache/spark/commit/da0f977f2846d7051102a32521a1704dca74ed12).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Thanks for the reply. I have add some more tests to verify the state of master and worker after relaunching.
    
    I will try think about if there are ways to reuse the old driver struct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    the fix LGTM, it would be better to add some comments to explain it clearly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    **[Test build #78023 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78023/testReport)** for PR 18084 at commit [`0887eab`](https://github.com/apache/spark/commit/0887eab874dac34b99c6dec58c979b3f8ee3feb7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class KVStoreSerializer `
      * `public abstract class KVStoreView<T> implements Iterable<T> `
      * `public class KVTypeInfo `
      * `public class LevelDB implements KVStore `
      * `  public static class TypeAliases `
      * `class LevelDBIterator<T> implements KVStoreIterator<T> `
      * `class LevelDBTypeInfo `
      * `  class Index `
      * `public class UnsupportedStoreVersionException extends IOException `
      * `          logError(s\"Not measuring processing time for listener class $className because a \" +`
      * `class FilteredObjectInputStream extends ObjectInputStream `
      * `        String.format(\"Unexpected class in stream: %s\", desc.getName()));`
      * `class HasMinSupport(Params):`
      * `class HasNumPartitions(Params):`
      * `class HasMinConfidence(Params):`
      * `case class UnresolvedRelation(`
      * `case class DayOfWeek(child: Expression) extends UnaryExpression with ImplicitCastInputTypes `
      * `case class Uuid() extends LeafExpression `
      * `case class StringReplace(srcExpr: Expression, searchExpr: Expression, replaceExpr: Expression)`
      * `case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes `
      * `trait Command extends LogicalPlan `
      * `case class UnresolvedHint(name: String, parameters: Seq[Any], child: LogicalPlan)`
      * `case class ResolvedHint(child: LogicalPlan, hints: HintInfo = HintInfo())`
      * `case class HintInfo(broadcast: Boolean = false) `
      * `public final class ParquetDictionary implements Dictionary `
      * `case class ExecutedCommandExec(cmd: RunnableCommand, children: Seq[SparkPlan]) extends SparkPlan `
      * `class RateSourceProvider extends StreamSourceProvider with DataSourceRegister `
      * `class RateStreamSource(`
      * `case class StateStoreId(`
      * `class UnsafeRowPair(var key: UnsafeRow = null, var value: UnsafeRow = null) `
      * `trait StateStoreWriter extends StatefulOperator `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Hi, I've thought more thoroughly about this.
    
    The main state involved here is Master.workers, Master.idToWorker, and WorkerInfo.drivers. Say `driverId1` runs on Worker A. Assume A is network partitioned, master calls removeWorker which set the worker's state to DEAD, and remove the worker from persistenceEngine, but does not remove it from Master.workers. Then launch the driver on Worker B. 
    
    When A reconnects, it will reregister to master, then master will remove the old WorkerInfo (whose `drivers` field is not empty), and add a new WorkerInfo (say `wf_A`), whose drivers are empty. After registered, the worker then re-sync state with master by sending `WorkerLatestState` with a `driverId1`, the master does not find it in `wf_A.drivers`, so it asks worker A to kill it. After killed the driver, worker A sends `DriverStateChanged(driverId1, DriverState.KILLED)`, the master then mistakenly removes `driverId1`, which now runs on worker B.
    
    How to recognize the `DriverStateChanged` come from worker A, not worker B? Maybe we can add a field `workerId` to `DriverStateChanged`, but is it possible the second run of `driverId1` is on worker A? consider the following scenario:
    
    1. worker A network partitioned
    2. master put `driverId1` to waitingDrivers
    3. worker A reconnects and register
    4. master launch `driverId1` on worker A
    5. worker A's `WorkerLatestState(_,_,Seq(driverId1))` arrives at master
    
    Now, how does worker A handle the `LaunchDriver(driverId1)` when it has already running a driver with `driverId1`? how does the master process `WorkerLatestState`? With the above message order, master will send `KillDriver` to worker A, then worker will kill `driverId1`, which is the relaunched one, then send `DriverStateChanged` to master, master will relaunch it...
    
    After all this, I think it better to relaunch the driver with a new id to make it simple. As to the cost, `removeDriver` will be called anyway, if not here, it will be called when `DriverStateChanged` come. `persistenceEngine` have to be called because the persistent state `driver.id` changed. So the cost is justified. And `relaunchDriver` is called when worker down or master down, it seems rarely because framework code is more stable than application code, so software bugs are less likely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18084: [SPARK-19900][core]Remove driver when relaunching...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18084


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    **[Test build #77312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77312/testReport)** for PR 18084 at commit [`9ea2061`](https://github.com/apache/spark/commit/9ea20611f28e845e9c74626aee3e191656fa01bb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78049/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18084: [SPARK-19900][core]Remove driver when relaunching...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18084#discussion_r118436939
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
    @@ -499,4 +500,103 @@ class MasterSuite extends SparkFunSuite
           assert(receivedMasterAddress === RpcAddress("localhost2", 10000))
         }
       }
    +
    +  test("SPARK-19900: there should be a corresponding driver for the app after relaunching driver") {
    +    val conf = new SparkConf().set("spark.worker.timeout", "1")
    +    val master = makeMaster(conf)
    +    master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
    +    eventually(timeout(10.seconds)) {
    +      val masterState = master.self.askSync[MasterStateResponse](RequestMasterState)
    +      assert(masterState.status === RecoveryState.ALIVE, "Master is not alive")
    +    }
    +
    +    val app = DeployTestUtils.createAppDesc()
    +    var appId = ""
    +    val driverEnv1 = RpcEnv.create("driver1", "localhost", 22344, conf, new SecurityManager(conf))
    +    val fakeDriver1 = driverEnv1.setupEndpoint("driver", new RpcEndpoint {
    +      override val rpcEnv: RpcEnv = driverEnv1
    +      override def receive: PartialFunction[Any, Unit] = {
    +        case RegisteredApplication(id, _) => appId = id
    +      }
    +    })
    +    val drivers = new HashMap[String, String]
    +    val workerEnv1 = RpcEnv.create("worker1", "localhost", 12344, conf, new SecurityManager(conf))
    +    val fakeWorker1 = workerEnv1.setupEndpoint("worker", new RpcEndpoint {
    +      override val rpcEnv: RpcEnv = workerEnv1
    +      override def receive: PartialFunction[Any, Unit] = {
    +        case RegisteredWorker(masterRef, _, _) =>
    +          masterRef.send(WorkerLatestState("1", Nil, drivers.keys.toSeq))
    +        case LaunchDriver(id, desc) =>
    +          drivers(id) = id
    +          master.self.send(RegisterApplication(app, fakeDriver1))
    +        case KillDriver(driverId) =>
    +          master.self.send(DriverStateChanged(driverId, DriverState.KILLED, None))
    +          drivers.remove(driverId)
    +      }
    +    })
    +    val worker1 = RegisterWorker(
    +      "1",
    +      "localhost",
    +      9999,
    +      fakeWorker1,
    +      10,
    +      1024,
    +      "http://localhost:8080",
    +      RpcAddress("localhost2", 10000))
    +    master.self.send(worker1)
    +    val driver = DeployTestUtils.createDriverDesc().copy(supervise = true)
    +    master.self.askSync[SubmitDriverResponse](RequestSubmitDriver(driver))
    +
    +    eventually(timeout(10.seconds)) {
    +      assert(!appId.isEmpty)
    +    }
    +
    +    eventually(timeout(10.seconds)) {
    +      val masterState = master.self.askSync[MasterStateResponse](RequestMasterState)
    +      assert(masterState.workers(0).state == WorkerState.DEAD)
    +    }
    +
    +    val driverEnv2 = RpcEnv.create("driver2", "localhost", 22345, conf, new SecurityManager(conf))
    +    val fakeDriver2 = driverEnv2.setupEndpoint("driver", new RpcEndpoint {
    --- End diff --
    
    updated, please have a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    **[Test build #78049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78049/testReport)** for PR 18084 at commit [`9ddf23a`](https://github.com/apache/spark/commit/9ddf23af3ee5853c5d1b53a05afc38f38509a8c2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18084: [SPARK-19900][core]Remove driver when relaunching...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18084#discussion_r121902338
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
    @@ -588,6 +633,70 @@ class MasterSuite extends SparkFunSuite
         }
       }
     
    +  test("SPARK-19900: there should be a corresponding driver for the app after relaunching driver") {
    +    val conf = new SparkConf().set("spark.worker.timeout", "1")
    +    val master = makeMaster(conf)
    +    master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
    +    eventually(timeout(10.seconds)) {
    +      val masterState = master.self.askSync[MasterStateResponse](RequestMasterState)
    --- End diff --
    
    Hi, this can not be moved because `MasterStateResponse` is changed over time. If we move the rpc out, the masterState will never change, and the assert will fail.
    
    See the above test `SPARK-20529:...`, there is a same eventually assert.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77423/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77353/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Maybe some more actions should be done in `relaunchDriver()` such as have `driver.worker` removes the dependency of the relaunched driver, but it will be sort of wasting resources to remove and later create a new driver, we should always prevent doing such things.
    
    Now, to help us step forward, would you like to spend some time to create a valid regression test case? That will help a lot when we are discussing further about the proper bug-fix proposal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    We should also check in Worker that we don't launch duplicate drivers, I think the logic should be added in handling `LaunchDriver` message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    **[Test build #78049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78049/testReport)** for PR 18084 at commit [`9ddf23a`](https://github.com/apache/spark/commit/9ddf23af3ee5853c5d1b53a05afc38f38509a8c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18084: [SPARK-19900][core]Remove driver when relaunching...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18084#discussion_r121881612
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
    @@ -588,6 +633,70 @@ class MasterSuite extends SparkFunSuite
         }
       }
     
    +  test("SPARK-19900: there should be a corresponding driver for the app after relaunching driver") {
    +    val conf = new SparkConf().set("spark.worker.timeout", "1")
    +    val master = makeMaster(conf)
    +    master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
    +    eventually(timeout(10.seconds)) {
    +      val masterState = master.self.askSync[MasterStateResponse](RequestMasterState)
    +      assert(masterState.status === RecoveryState.ALIVE, "Master is not alive")
    +    }
    +    val worker1 = new MockWorker(master.self)
    +    worker1.rpcEnv.setupEndpoint("worker", worker1)
    +    val worker1Reg = RegisterWorker(
    +      worker1.id,
    +      "localhost",
    +      9998,
    +      worker1.self,
    +      10,
    +      1024,
    +      "http://localhost:8080",
    +      RpcAddress("localhost2", 10000))
    +    master.self.send(worker1Reg)
    +    val driver = DeployTestUtils.createDriverDesc().copy(supervise = true)
    +    master.self.askSync[SubmitDriverResponse](RequestSubmitDriver(driver))
    +
    +    eventually(timeout(10.seconds)) {
    +      assert(worker1.apps.nonEmpty)
    +    }
    +
    +    eventually(timeout(10.seconds)) {
    +      val masterState = master.self.askSync[MasterStateResponse](RequestMasterState)
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    **[Test build #78023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78023/testReport)** for PR 18084 at commit [`0887eab`](https://github.com/apache/spark/commit/0887eab874dac34b99c6dec58c979b3f8ee3feb7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78023/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    **[Test build #77312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77312/testReport)** for PR 18084 at commit [`9ea2061`](https://github.com/apache/spark/commit/9ea20611f28e845e9c74626aee3e191656fa01bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18084: [SPARK-19900][core]Remove driver when relaunching...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18084#discussion_r121880867
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
    @@ -40,6 +42,49 @@ import org.apache.spark.deploy.DeployMessages._
     import org.apache.spark.rpc.{RpcAddress, RpcEndpoint, RpcEndpointRef, RpcEnv}
     import org.apache.spark.serializer
     
    +object MockWorker {
    +  val counter = new AtomicInteger(10000)
    +}
    +
    +class MockWorker(master: RpcEndpointRef, conf: SparkConf = new SparkConf) extends RpcEndpoint {
    +  val seq = MockWorker.counter.incrementAndGet()
    +  val id = seq.toString
    +  override val rpcEnv: RpcEnv = RpcEnv.create("worker", "localhost", seq,
    +    conf, new SecurityManager(conf))
    +  var apps = new mutable.HashMap[String, String]()
    +  val driverIdToAppId = new mutable.HashMap[String, String]()
    +  def newDriver(driverId: String): RpcEndpointRef = {
    +    val name = s"driver_${drivers.size}"
    +    rpcEnv.setupEndpoint(name, new RpcEndpoint {
    +      override val rpcEnv: RpcEnv = MockWorker.this.rpcEnv
    +      override def receive: PartialFunction[Any, Unit] = {
    +        case RegisteredApplication(appId, _) =>
    +          apps(appId) = appId
    +          driverIdToAppId(driverId) = appId
    +      }
    +    })
    +  }
    +
    +  val appDesc = DeployTestUtils.createAppDesc()
    +  val drivers = new mutable.HashMap[String, String]
    +  override def receive: PartialFunction[Any, Unit] = {
    +    case RegisteredWorker(masterRef, _, _) =>
    +      masterRef.send(WorkerLatestState(id, Nil, drivers.keys.toSeq))
    +    case LaunchDriver(driverId, desc) =>
    +      drivers(driverId) = driverId
    --- End diff --
    
    seems `drivers` can be a set instead of a map?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    **[Test build #77353 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77353/testReport)** for PR 18084 at commit [`6ab9a0f`](https://github.com/apache/spark/commit/6ab9a0f5a62e4ec3c9757f748ff5e278d4741d25).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    ping @jiangxb1987 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18084: [SPARK-19900][core]Remove driver when relaunching...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18084#discussion_r118376781
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
    @@ -499,4 +500,103 @@ class MasterSuite extends SparkFunSuite
           assert(receivedMasterAddress === RpcAddress("localhost2", 10000))
         }
       }
    +
    +  test("SPARK-19900: there should be a corresponding driver for the app after relaunching driver") {
    +    val conf = new SparkConf().set("spark.worker.timeout", "1")
    +    val master = makeMaster(conf)
    +    master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
    +    eventually(timeout(10.seconds)) {
    +      val masterState = master.self.askSync[MasterStateResponse](RequestMasterState)
    +      assert(masterState.status === RecoveryState.ALIVE, "Master is not alive")
    +    }
    +
    +    val app = DeployTestUtils.createAppDesc()
    +    var appId = ""
    +    val driverEnv1 = RpcEnv.create("driver1", "localhost", 22344, conf, new SecurityManager(conf))
    +    val fakeDriver1 = driverEnv1.setupEndpoint("driver", new RpcEndpoint {
    +      override val rpcEnv: RpcEnv = driverEnv1
    +      override def receive: PartialFunction[Any, Unit] = {
    +        case RegisteredApplication(id, _) => appId = id
    +      }
    +    })
    +    val drivers = new HashMap[String, String]
    +    val workerEnv1 = RpcEnv.create("worker1", "localhost", 12344, conf, new SecurityManager(conf))
    +    val fakeWorker1 = workerEnv1.setupEndpoint("worker", new RpcEndpoint {
    +      override val rpcEnv: RpcEnv = workerEnv1
    +      override def receive: PartialFunction[Any, Unit] = {
    +        case RegisteredWorker(masterRef, _, _) =>
    +          masterRef.send(WorkerLatestState("1", Nil, drivers.keys.toSeq))
    +        case LaunchDriver(id, desc) =>
    +          drivers(id) = id
    +          master.self.send(RegisterApplication(app, fakeDriver1))
    +        case KillDriver(driverId) =>
    +          master.self.send(DriverStateChanged(driverId, DriverState.KILLED, None))
    +          drivers.remove(driverId)
    +      }
    +    })
    +    val worker1 = RegisterWorker(
    +      "1",
    +      "localhost",
    +      9999,
    +      fakeWorker1,
    +      10,
    +      1024,
    +      "http://localhost:8080",
    +      RpcAddress("localhost2", 10000))
    +    master.self.send(worker1)
    +    val driver = DeployTestUtils.createDriverDesc().copy(supervise = true)
    +    master.self.askSync[SubmitDriverResponse](RequestSubmitDriver(driver))
    +
    +    eventually(timeout(10.seconds)) {
    +      assert(!appId.isEmpty)
    +    }
    +
    +    eventually(timeout(10.seconds)) {
    +      val masterState = master.self.askSync[MasterStateResponse](RequestMasterState)
    +      assert(masterState.workers(0).state == WorkerState.DEAD)
    +    }
    +
    +    val driverEnv2 = RpcEnv.create("driver2", "localhost", 22345, conf, new SecurityManager(conf))
    +    val fakeDriver2 = driverEnv2.setupEndpoint("driver", new RpcEndpoint {
    --- End diff --
    
    I believe these duplicate code can be combined.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    LGTM, pending test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77312/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18084
  
    Also please rebase the latest master :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org