You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Alexander Rukletsov <ru...@gmail.com> on 2017/12/15 10:43:45 UTC

Re: Review Request 64069: Ensured command executor always honors shutdown request.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64069/
-----------------------------------------------------------

(Updated Dec. 15, 2017, 10:43 a.m.)


Review request for mesos, Andrei Budnik, Anand Mazumdar, Armand Grillet, and Vinod Kone.


Repository: mesos


Description
-------

See summary.


Diffs
-----

  src/launcher/executor.cpp c688c04e598ac140421fd5e47359b0e48d30bcc5 


Diff: https://reviews.apache.org/r/64069/diff/1/


Testing
-------

See https://reviews.apache.org/r/64033/


Thanks,

Alexander Rukletsov


Re: Review Request 64069: Ensured command executor always honors shutdown request.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Dec. 15, 2017, 4:48 p.m., Vinod Kone wrote:
> > src/launcher/executor.cpp
> > Lines 768 (patched)
> > <https://reviews.apache.org/r/64069/diff/1/?file=1901225#file1901225line768>
> >
> >     Can you add a test for this?

Writing test is tricky: our test harness does not allow to drop outgoing messages and hence `RunTaskMessage` from the agent to executor can't be dropped (since executor is running in a separate process it can't be dropped there as incoming message either). We should consider either improving our harness or providing an ability to create executors in test processes. For now, I've done manual testing by dropping the message in exec.cpp.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64069/#review193931
-----------------------------------------------------------


On Dec. 15, 2017, 10:43 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64069/
> -----------------------------------------------------------
> 
> (Updated Dec. 15, 2017, 10:43 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Anand Mazumdar, Armand Grillet, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/launcher/executor.cpp c688c04e598ac140421fd5e47359b0e48d30bcc5 
> 
> 
> Diff: https://reviews.apache.org/r/64069/diff/1/
> 
> 
> Testing
> -------
> 
> See https://reviews.apache.org/r/64033/
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 64069: Ensured command executor always honors shutdown request.

Posted by Vinod Kone <vi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64069/#review193931
-----------------------------------------------------------




src/launcher/executor.cpp
Lines 768 (patched)
<https://reviews.apache.org/r/64069/#comment272599>

    Can you add a test for this?


- Vinod Kone


On Dec. 15, 2017, 10:43 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64069/
> -----------------------------------------------------------
> 
> (Updated Dec. 15, 2017, 10:43 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Anand Mazumdar, Armand Grillet, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/launcher/executor.cpp c688c04e598ac140421fd5e47359b0e48d30bcc5 
> 
> 
> Diff: https://reviews.apache.org/r/64069/diff/1/
> 
> 
> Testing
> -------
> 
> See https://reviews.apache.org/r/64033/
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 64069: Ensured command executor always honors shutdown request.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64069/#review194257
-----------------------------------------------------------



PASS: Mesos patch 64069 was successfully built and tested.

Reviews applied: `['64032', '64069']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64069

- Mesos Reviewbot Windows


On Dec. 20, 2017, 4:07 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64069/
> -----------------------------------------------------------
> 
> (Updated Dec. 20, 2017, 4:07 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Anand Mazumdar, Armand Grillet, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/launcher/executor.cpp c688c04e598ac140421fd5e47359b0e48d30bcc5 
> 
> 
> Diff: https://reviews.apache.org/r/64069/diff/1/
> 
> 
> Testing
> -------
> 
> Writing a proper test is tricky: our test harness does not allow to drop outgoing messages and hence `RunTaskMessage` from the agent to the executor can't be dropped (since executor is running in a separate process it can't be dropped there as incoming message either). So I've tested this manually via dropping `RunTaskMessage` in `exec.cpp` and using the following test:
> ```
> TEST_P(CommandExecutorTest, ShutdownWithNoLaunch)
> {
>   Try<Owned<cluster::Master>> master = StartMaster();
>   ASSERT_SOME(master);
> 
>   Owned<MasterDetector> detector = master.get()->createDetector();
> 
>   slave::Flags flags = CreateSlaveFlags();
>   flags.http_command_executor = false;
> 
>   Try<Owned<cluster::Slave>> slave = StartSlave(detector.get(), flags);
>   ASSERT_SOME(slave);
> 
>   MockScheduler sched;
>   MesosSchedulerDriver driver(
>       &sched, DEFAULT_FRAMEWORK_INFO, master.get()->pid, DEFAULT_CREDENTIAL);
> 
>   EXPECT_CALL(sched, registered(&driver, _, _));
> 
>   Future<vector<Offer>> offers;
>   EXPECT_CALL(sched, resourceOffers(&driver, _))
>     .WillOnce(FutureArg<1>(&offers))
>     .WillRepeatedly(Return()); // Ignore subsequent offers.
> 
>   driver.start();
> 
>   AWAIT_READY(offers);
>   EXPECT_EQ(1u, offers->size());
> 
>   // Launch a task with the command executor.
>   TaskInfo task = createTask(
>       offers->front().slave_id(),
>       offers->front().resources(),
>       SLEEP_COMMAND(1000));
> 
>   driver.launchTasks(offers->front().id(), {task});
> 
>   driver.stop();
>   driver.join();
> }
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 64069: Ensured command executor always honors shutdown request.

Posted by Andrei Budnik <ab...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64069/#review194337
-----------------------------------------------------------


Ship it!




Ship It!

- Andrei Budnik


On Dec. 20, 2017, 8:14 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64069/
> -----------------------------------------------------------
> 
> (Updated Dec. 20, 2017, 8:14 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Anand Mazumdar, Armand Grillet, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/launcher/executor.cpp 31a47106d7f511220afba4fb382c9252d4671f6e 
> 
> 
> Diff: https://reviews.apache.org/r/64069/diff/2/
> 
> 
> Testing
> -------
> 
> Writing a proper test is tricky: our test harness does not allow to drop outgoing messages and hence `RunTaskMessage` from the agent to the executor can't be dropped (since executor is running in a separate process it can't be dropped there as incoming message either). So I've tested this manually via dropping `RunTaskMessage` in `exec.cpp` and using the following test:
> ```
> TEST_P(CommandExecutorTest, ShutdownWithNoLaunch)
> {
>   Try<Owned<cluster::Master>> master = StartMaster();
>   ASSERT_SOME(master);
> 
>   Owned<MasterDetector> detector = master.get()->createDetector();
> 
>   slave::Flags flags = CreateSlaveFlags();
>   flags.http_command_executor = false;
> 
>   Try<Owned<cluster::Slave>> slave = StartSlave(detector.get(), flags);
>   ASSERT_SOME(slave);
> 
>   MockScheduler sched;
>   MesosSchedulerDriver driver(
>       &sched, DEFAULT_FRAMEWORK_INFO, master.get()->pid, DEFAULT_CREDENTIAL);
> 
>   EXPECT_CALL(sched, registered(&driver, _, _));
> 
>   Future<vector<Offer>> offers;
>   EXPECT_CALL(sched, resourceOffers(&driver, _))
>     .WillOnce(FutureArg<1>(&offers))
>     .WillRepeatedly(Return()); // Ignore subsequent offers.
> 
>   driver.start();
> 
>   AWAIT_READY(offers);
>   EXPECT_EQ(1u, offers->size());
> 
>   // Launch a task with the command executor.
>   TaskInfo task = createTask(
>       offers->front().slave_id(),
>       offers->front().resources(),
>       SLEEP_COMMAND(1000));
> 
>   driver.launchTasks(offers->front().id(), {task});
> 
>   driver.stop();
>   driver.join();
> }
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 64069: Ensured command executor always honors shutdown request.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64069/
-----------------------------------------------------------

(Updated Dec. 20, 2017, 8:14 p.m.)


Review request for mesos, Andrei Budnik, Anand Mazumdar, Armand Grillet, and Vinod Kone.


Repository: mesos


Description
-------

See summary.


Diffs (updated)
-----

  src/launcher/executor.cpp 31a47106d7f511220afba4fb382c9252d4671f6e 


Diff: https://reviews.apache.org/r/64069/diff/2/

Changes: https://reviews.apache.org/r/64069/diff/1-2/


Testing
-------

Writing a proper test is tricky: our test harness does not allow to drop outgoing messages and hence `RunTaskMessage` from the agent to the executor can't be dropped (since executor is running in a separate process it can't be dropped there as incoming message either). So I've tested this manually via dropping `RunTaskMessage` in `exec.cpp` and using the following test:
```
TEST_P(CommandExecutorTest, ShutdownWithNoLaunch)
{
  Try<Owned<cluster::Master>> master = StartMaster();
  ASSERT_SOME(master);

  Owned<MasterDetector> detector = master.get()->createDetector();

  slave::Flags flags = CreateSlaveFlags();
  flags.http_command_executor = false;

  Try<Owned<cluster::Slave>> slave = StartSlave(detector.get(), flags);
  ASSERT_SOME(slave);

  MockScheduler sched;
  MesosSchedulerDriver driver(
      &sched, DEFAULT_FRAMEWORK_INFO, master.get()->pid, DEFAULT_CREDENTIAL);

  EXPECT_CALL(sched, registered(&driver, _, _));

  Future<vector<Offer>> offers;
  EXPECT_CALL(sched, resourceOffers(&driver, _))
    .WillOnce(FutureArg<1>(&offers))
    .WillRepeatedly(Return()); // Ignore subsequent offers.

  driver.start();

  AWAIT_READY(offers);
  EXPECT_EQ(1u, offers->size());

  // Launch a task with the command executor.
  TaskInfo task = createTask(
      offers->front().slave_id(),
      offers->front().resources(),
      SLEEP_COMMAND(1000));

  driver.launchTasks(offers->front().id(), {task});

  driver.stop();
  driver.join();
}
```


Thanks,

Alexander Rukletsov


Re: Review Request 64069: Ensured command executor always honors shutdown request.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64069/
-----------------------------------------------------------

(Updated Dec. 20, 2017, 4:07 p.m.)


Review request for mesos, Andrei Budnik, Anand Mazumdar, Armand Grillet, and Vinod Kone.


Repository: mesos


Description
-------

See summary.


Diffs
-----

  src/launcher/executor.cpp c688c04e598ac140421fd5e47359b0e48d30bcc5 


Diff: https://reviews.apache.org/r/64069/diff/1/


Testing (updated)
-------

Writing a proper test is tricky: our test harness does not allow to drop outgoing messages and hence `RunTaskMessage` from the agent to the executor can't be dropped (since executor is running in a separate process it can't be dropped there as incoming message either). So I've tested this manually via dropping `RunTaskMessage` in `exec.cpp` and using the following test:
```
TEST_P(CommandExecutorTest, ShutdownWithNoLaunch)
{
  Try<Owned<cluster::Master>> master = StartMaster();
  ASSERT_SOME(master);

  Owned<MasterDetector> detector = master.get()->createDetector();

  slave::Flags flags = CreateSlaveFlags();
  flags.http_command_executor = false;

  Try<Owned<cluster::Slave>> slave = StartSlave(detector.get(), flags);
  ASSERT_SOME(slave);

  MockScheduler sched;
  MesosSchedulerDriver driver(
      &sched, DEFAULT_FRAMEWORK_INFO, master.get()->pid, DEFAULT_CREDENTIAL);

  EXPECT_CALL(sched, registered(&driver, _, _));

  Future<vector<Offer>> offers;
  EXPECT_CALL(sched, resourceOffers(&driver, _))
    .WillOnce(FutureArg<1>(&offers))
    .WillRepeatedly(Return()); // Ignore subsequent offers.

  driver.start();

  AWAIT_READY(offers);
  EXPECT_EQ(1u, offers->size());

  // Launch a task with the command executor.
  TaskInfo task = createTask(
      offers->front().slave_id(),
      offers->front().resources(),
      SLEEP_COMMAND(1000));

  driver.launchTasks(offers->front().id(), {task});

  driver.stop();
  driver.join();
}
```


Thanks,

Alexander Rukletsov


Re: Review Request 64069: Ensured command executor always honors shutdown request.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Dec. 18, 2017, 9:40 p.m., Vinod Kone wrote:
> > What about the fix for other built-in executors (docker, default)?
> > 
> > Also, this only seems to fix the shutdown path, what about kill task path?
> > 
> > 
> > Regaring the fix for kill task when a task hasn't been launched from the perspective of the executor, I think doing a CHECK fail makes sense to me because there was violation of an external variant (kill comes after launch). Silently shutting down in such a scenario seems risky because it will mask the extent of the problem. Either way we need the system to heal and correct itself when variants fail; right now the executor ignores the kill and stays up which is a really bad UX. cc @anand

Other executors already behave correctly.
Docker: https://github.com/apache/mesos/blob/5f40f3d401bbdd7e324113214c286a26faf67540/src/docker/executor.cpp#L333
Default: https://github.com/apache/mesos/blob/5f40f3d401bbdd7e324113214c286a26faf67540/src/launcher/default_executor.cpp#L966

Kill task path is addressed here: https://reviews.apache.org/r/64033/

I completely agree with the sentiment about bad UX. I'm fine with doing the `CHECK`; let's continue discussion in the aforementioned RR.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64069/#review194088
-----------------------------------------------------------


On Dec. 15, 2017, 10:43 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64069/
> -----------------------------------------------------------
> 
> (Updated Dec. 15, 2017, 10:43 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Anand Mazumdar, Armand Grillet, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/launcher/executor.cpp c688c04e598ac140421fd5e47359b0e48d30bcc5 
> 
> 
> Diff: https://reviews.apache.org/r/64069/diff/1/
> 
> 
> Testing
> -------
> 
> See https://reviews.apache.org/r/64033/
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 64069: Ensured command executor always honors shutdown request.

Posted by Vinod Kone <vi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64069/#review194088
-----------------------------------------------------------



What about the fix for other built-in executors (docker, default)?

Also, this only seems to fix the shutdown path, what about kill task path?


Regaring the fix for kill task when a task hasn't been launched from the perspective of the executor, I think doing a CHECK fail makes sense to me because there was violation of an external variant (kill comes after launch). Silently shutting down in such a scenario seems risky because it will mask the extent of the problem. Either way we need the system to heal and correct itself when variants fail; right now the executor ignores the kill and stays up which is a really bad UX. cc @anand

- Vinod Kone


On Dec. 15, 2017, 10:43 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64069/
> -----------------------------------------------------------
> 
> (Updated Dec. 15, 2017, 10:43 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Anand Mazumdar, Armand Grillet, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/launcher/executor.cpp c688c04e598ac140421fd5e47359b0e48d30bcc5 
> 
> 
> Diff: https://reviews.apache.org/r/64069/diff/1/
> 
> 
> Testing
> -------
> 
> See https://reviews.apache.org/r/64033/
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 64069: Ensured command executor always honors shutdown request.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Dec. 19, 2017, 3:25 p.m., Andrei Budnik wrote:
> > src/launcher/executor.cpp
> > Lines 768 (patched)
> > <https://reviews.apache.org/r/64069/diff/1/?file=1901225#file1901225line768>
> >
> >     Let's print warning message before terminating:
> >     ```
> >     LOG(WARNING) << "Attempted to shutdown the executor with no running task";
> >     ```

I'm not sure we should. Shutdown is issued if, e.g., the scheduler exits, which — in contrast to kill task — is not related to the task lifecycle.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64069/#review194147
-----------------------------------------------------------


On Dec. 15, 2017, 10:43 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64069/
> -----------------------------------------------------------
> 
> (Updated Dec. 15, 2017, 10:43 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Anand Mazumdar, Armand Grillet, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/launcher/executor.cpp c688c04e598ac140421fd5e47359b0e48d30bcc5 
> 
> 
> Diff: https://reviews.apache.org/r/64069/diff/1/
> 
> 
> Testing
> -------
> 
> See https://reviews.apache.org/r/64033/
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 64069: Ensured command executor always honors shutdown request.

Posted by Andrei Budnik <ab...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64069/#review194147
-----------------------------------------------------------




src/launcher/executor.cpp
Lines 768 (patched)
<https://reviews.apache.org/r/64069/#comment272883>

    Let's print warning message before terminating:
    ```
    LOG(WARNING) << "Attempted to shutdown the executor with no running task";
    ```


- Andrei Budnik


On Dec. 15, 2017, 10:43 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64069/
> -----------------------------------------------------------
> 
> (Updated Dec. 15, 2017, 10:43 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Anand Mazumdar, Armand Grillet, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/launcher/executor.cpp c688c04e598ac140421fd5e47359b0e48d30bcc5 
> 
> 
> Diff: https://reviews.apache.org/r/64069/diff/1/
> 
> 
> Testing
> -------
> 
> See https://reviews.apache.org/r/64033/
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>