You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benjamin Bannier <be...@mesosphere.io> on 2017/01/16 11:28:03 UTC

Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/
-----------------------------------------------------------

Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.


Bugs: MESOS-6631
    https://issues.apache.org/jira/browse/MESOS-6631


Repository: mesos


Description
-------

This allows us to confirm that an updated FrameworkInfo is fully
compatible with an already known one. By performing this check in
updateFrameworkInfo when can be sure that both the new and old
FrameworkInfo are available (e.g., after framework reregistration, or
master or framework failover).


Diffs
-----

  src/master/master.hpp 44f4fecb1fbe8bebf830990a59a5462338e6e004 
  src/master/master.cpp b863ff6e93931c3d1ee056248084c7f44caf2fd9 

Diff: https://reviews.apache.org/r/55571/diff/


Testing
-------

Tested as part of https://reviews.apache.org/r/55271/.


Thanks,

Benjamin Bannier


Re: Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On Jan. 17, 2017, 10:07 a.m., Jay Guo wrote:
> > src/master/master.cpp, lines 7201-7203
> > <https://reviews.apache.org/r/55571/diff/1/?file=1605757#file1605757line7201>
> >
> >     There may be some corner cases that causes this `CHECK` to coredump. For example:
> >     1. Start a Mesos cluster with master, agent and a scheduler with single role `role1`.
> >     2. Master failover
> >     3. Agent re-register before scheduler does
> >     4. Scheduler re-subscribe with multi-role `{role2}`
> >     
> >     In this case, master re-construct FrameworkInfo from re-registered agent, which contains old role. However, in current implementation, `Error` of `updateFrameworkInfo` is not caught for recovered frameworks.
> >     
> >     In theory, this test case shouldn't cause master coredump:
> >     ```cpp
> >       master::Flags masterFlags = CreateMasterFlags();
> >       Try<Owned<cluster::Master>> master = StartMaster(masterFlags);
> >       ASSERT_SOME(master);
> >     
> >       MockExecutor exec(DEFAULT_EXECUTOR_ID);
> >       TestContainerizer containerizer(&exec);
> >     
> >       StandaloneMasterDetector slaveDetector(master.get()->pid);
> >       Try<Owned<cluster::Slave>> slave = StartSlave(&slaveDetector, &containerizer);
> >       ASSERT_SOME(slave);
> >     
> >       FrameworkInfo frameworkInfo = DEFAULT_FRAMEWORK_INFO;
> >       frameworkInfo.set_failover_timeout(1024);
> >       frameworkInfo.set_role("role1");
> >     
> >       Future<FrameworkID> frameworkId;
> >     
> >       {
> >         MockScheduler sched;
> >         MesosSchedulerDriver driver(
> >             &sched, frameworkInfo, master.get()->pid, DEFAULT_CREDENTIAL);
> >     
> >         EXPECT_CALL(sched, registered(&driver, _, _))
> >           .WillOnce(FutureArg<1>(&frameworkId));
> >     
> >         Future<vector<Offer>> offers;
> >         EXPECT_CALL(sched, resourceOffers(&driver, _))
> >           .WillOnce(FutureArg<1>(&offers))
> >           .WillRepeatedly(Return()); // Ignore subsequent offers.
> >     
> >         driver.start();
> >     
> >         Clock::pause();
> >         Clock::settle();
> >         Clock::resume();
> >     
> >         AWAIT_READY(frameworkId);
> >     
> >         AWAIT_READY(offers);
> >         EXPECT_FALSE(offers->empty());
> >     
> >         TaskInfo task;
> >         task.set_name("");
> >         task.mutable_task_id()->set_value("1");
> >         task.mutable_slave_id()->MergeFrom(offers.get()[0].slave_id());
> >         task.mutable_resources()->MergeFrom(offers.get()[0].resources());
> >         task.mutable_executor()->MergeFrom(DEFAULT_EXECUTOR_INFO);
> >     
> >         EXPECT_CALL(exec, registered(_, _, _, _))
> >           .Times(1);
> >     
> >         EXPECT_CALL(exec, launchTask(_, _))
> >           .WillOnce(SendStatusUpdateFromTask(TASK_RUNNING));
> >     
> >         Future<TaskStatus> runningStatus;
> >         EXPECT_CALL(sched, statusUpdate(&driver, _))
> >           .WillOnce(FutureArg<1>(&runningStatus));
> >     
> >         driver.launchTasks(offers.get()[0].id(), {task});
> >     
> >         AWAIT_READY(runningStatus);
> >         EXPECT_EQ(TASK_RUNNING, runningStatus.get().state());
> >         EXPECT_EQ(task.task_id(), runningStatus.get().task_id());
> >       }
> >     
> >       slaveDetector.appoint(None());
> >     
> >       master->reset();
> >       master = StartMaster(masterFlags);
> >       ASSERT_SOME(master);
> >     
> >       Future<SlaveReregisteredMessage> slaveReregisteredMessage =
> >         FUTURE_PROTOBUF(SlaveReregisteredMessage(), _, _);
> >     
> >       slaveDetector.appoint(master.get()->pid);
> >     
> >       AWAIT_READY(slaveReregisteredMessage);
> >     
> >       frameworkInfo.mutable_id()->CopyFrom(frameworkId.get());
> >       frameworkInfo.add_roles("role2");
> >       frameworkInfo.clear_role();
> >       frameworkInfo.add_capabilities()->set_type(
> >           FrameworkInfo::Capability::MULTI_ROLE);
> >     
> >       MockScheduler sched;
> >       MesosSchedulerDriver driver(
> >           &sched, frameworkInfo, master.get()->pid, DEFAULT_CREDENTIAL);
> >     
> >       Future<Nothing> registered;
> >       EXPECT_CALL(sched, registered(&driver, _, _))
> >         .WillOnce(FutureSatisfy(&registered));
> >     
> >       driver.start();
> >     
> >       Clock::pause();
> >       Clock::settle();
> >       Clock::resume();
> >     
> >       AWAIT_READY(registered);
> >     
> >       EXPECT_CALL(exec, shutdown(_))
> >         .Times(AtMost(1));
> >     
> >       driver.stop();
> >       driver.join();
> >     ```

Thanks for spotting this, this is of course unacceptable. Also your test case was helpful. I updated the patch to allow `Master::activateRecoveredFramework` to fail.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/#review161827
-----------------------------------------------------------


On Jan. 17, 2017, 6:13 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55571/
> -----------------------------------------------------------
> 
> (Updated Jan. 17, 2017, 6:13 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Bugs: MESOS-6631
>     https://issues.apache.org/jira/browse/MESOS-6631
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This allows us to confirm that an updated FrameworkInfo is fully
> compatible with an already known one. By performing this check in
> updateFrameworkInfo when can be sure that both the new and old
> FrameworkInfo are available (e.g., after framework reregistration, or
> master or framework failover).
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 44f4fecb1fbe8bebf830990a59a5462338e6e004 
>   src/master/master.cpp b863ff6e93931c3d1ee056248084c7f44caf2fd9 
> 
> Diff: https://reviews.apache.org/r/55571/diff/
> 
> 
> Testing
> -------
> 
> Tested as part of https://reviews.apache.org/r/55271/.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On Jan. 17, 2017, 10:07 a.m., Jay Guo wrote:
> > src/master/master.cpp, lines 7201-7203
> > <https://reviews.apache.org/r/55571/diff/1/?file=1605757#file1605757line7201>
> >
> >     There may be some corner cases that causes this `CHECK` to coredump. For example:
> >     1. Start a Mesos cluster with master, agent and a scheduler with single role `role1`.
> >     2. Master failover
> >     3. Agent re-register before scheduler does
> >     4. Scheduler re-subscribe with multi-role `{role2}`
> >     
> >     In this case, master re-construct FrameworkInfo from re-registered agent, which contains old role. However, in current implementation, `Error` of `updateFrameworkInfo` is not caught for recovered frameworks.
> >     
> >     In theory, this test case shouldn't cause master coredump:
> >     ```cpp
> >       master::Flags masterFlags = CreateMasterFlags();
> >       Try<Owned<cluster::Master>> master = StartMaster(masterFlags);
> >       ASSERT_SOME(master);
> >     
> >       MockExecutor exec(DEFAULT_EXECUTOR_ID);
> >       TestContainerizer containerizer(&exec);
> >     
> >       StandaloneMasterDetector slaveDetector(master.get()->pid);
> >       Try<Owned<cluster::Slave>> slave = StartSlave(&slaveDetector, &containerizer);
> >       ASSERT_SOME(slave);
> >     
> >       FrameworkInfo frameworkInfo = DEFAULT_FRAMEWORK_INFO;
> >       frameworkInfo.set_failover_timeout(1024);
> >       frameworkInfo.set_role("role1");
> >     
> >       Future<FrameworkID> frameworkId;
> >     
> >       {
> >         MockScheduler sched;
> >         MesosSchedulerDriver driver(
> >             &sched, frameworkInfo, master.get()->pid, DEFAULT_CREDENTIAL);
> >     
> >         EXPECT_CALL(sched, registered(&driver, _, _))
> >           .WillOnce(FutureArg<1>(&frameworkId));
> >     
> >         Future<vector<Offer>> offers;
> >         EXPECT_CALL(sched, resourceOffers(&driver, _))
> >           .WillOnce(FutureArg<1>(&offers))
> >           .WillRepeatedly(Return()); // Ignore subsequent offers.
> >     
> >         driver.start();
> >     
> >         Clock::pause();
> >         Clock::settle();
> >         Clock::resume();
> >     
> >         AWAIT_READY(frameworkId);
> >     
> >         AWAIT_READY(offers);
> >         EXPECT_FALSE(offers->empty());
> >     
> >         TaskInfo task;
> >         task.set_name("");
> >         task.mutable_task_id()->set_value("1");
> >         task.mutable_slave_id()->MergeFrom(offers.get()[0].slave_id());
> >         task.mutable_resources()->MergeFrom(offers.get()[0].resources());
> >         task.mutable_executor()->MergeFrom(DEFAULT_EXECUTOR_INFO);
> >     
> >         EXPECT_CALL(exec, registered(_, _, _, _))
> >           .Times(1);
> >     
> >         EXPECT_CALL(exec, launchTask(_, _))
> >           .WillOnce(SendStatusUpdateFromTask(TASK_RUNNING));
> >     
> >         Future<TaskStatus> runningStatus;
> >         EXPECT_CALL(sched, statusUpdate(&driver, _))
> >           .WillOnce(FutureArg<1>(&runningStatus));
> >     
> >         driver.launchTasks(offers.get()[0].id(), {task});
> >     
> >         AWAIT_READY(runningStatus);
> >         EXPECT_EQ(TASK_RUNNING, runningStatus.get().state());
> >         EXPECT_EQ(task.task_id(), runningStatus.get().task_id());
> >       }
> >     
> >       slaveDetector.appoint(None());
> >     
> >       master->reset();
> >       master = StartMaster(masterFlags);
> >       ASSERT_SOME(master);
> >     
> >       Future<SlaveReregisteredMessage> slaveReregisteredMessage =
> >         FUTURE_PROTOBUF(SlaveReregisteredMessage(), _, _);
> >     
> >       slaveDetector.appoint(master.get()->pid);
> >     
> >       AWAIT_READY(slaveReregisteredMessage);
> >     
> >       frameworkInfo.mutable_id()->CopyFrom(frameworkId.get());
> >       frameworkInfo.add_roles("role2");
> >       frameworkInfo.clear_role();
> >       frameworkInfo.add_capabilities()->set_type(
> >           FrameworkInfo::Capability::MULTI_ROLE);
> >     
> >       MockScheduler sched;
> >       MesosSchedulerDriver driver(
> >           &sched, frameworkInfo, master.get()->pid, DEFAULT_CREDENTIAL);
> >     
> >       Future<Nothing> registered;
> >       EXPECT_CALL(sched, registered(&driver, _, _))
> >         .WillOnce(FutureSatisfy(&registered));
> >     
> >       driver.start();
> >     
> >       Clock::pause();
> >       Clock::settle();
> >       Clock::resume();
> >     
> >       AWAIT_READY(registered);
> >     
> >       EXPECT_CALL(exec, shutdown(_))
> >         .Times(AtMost(1));
> >     
> >       driver.stop();
> >       driver.join();
> >     ```
> 
> Benjamin Bannier wrote:
>     Thanks for spotting this, this is of course unacceptable. Also your test case was helpful. I updated the patch to allow `Master::activateRecoveredFramework` to fail.
> 
> Jay Guo wrote:
>     Maybe you could add this test case as part of our test suite as well?

I added the test case to the follow-up patch, https://reviews.apache.org/r/55271.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/#review161827
-----------------------------------------------------------


On Jan. 18, 2017, 4:26 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55571/
> -----------------------------------------------------------
> 
> (Updated Jan. 18, 2017, 4:26 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Bugs: MESOS-6631
>     https://issues.apache.org/jira/browse/MESOS-6631
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This allows us to confirm that an updated FrameworkInfo is fully
> compatible with an already known one. By performing this check in
> updateFrameworkInfo when can be sure that both the new and old
> FrameworkInfo are available (e.g., after framework reregistration, or
> master or framework failover).
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 8e8a9037af11cf95961b6498540a0fd486ed091b 
>   src/master/master.cpp 73159328ce3fd838e02eba0e6a30cf69efc319ba 
> 
> Diff: https://reviews.apache.org/r/55571/diff/
> 
> 
> Testing
> -------
> 
> Tested as part of https://reviews.apache.org/r/55271/.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

Posted by Jay Guo <gu...@gmail.com>.

> On Jan. 17, 2017, 5:07 p.m., Jay Guo wrote:
> > src/master/master.cpp, lines 7201-7203
> > <https://reviews.apache.org/r/55571/diff/1/?file=1605757#file1605757line7201>
> >
> >     There may be some corner cases that causes this `CHECK` to coredump. For example:
> >     1. Start a Mesos cluster with master, agent and a scheduler with single role `role1`.
> >     2. Master failover
> >     3. Agent re-register before scheduler does
> >     4. Scheduler re-subscribe with multi-role `{role2}`
> >     
> >     In this case, master re-construct FrameworkInfo from re-registered agent, which contains old role. However, in current implementation, `Error` of `updateFrameworkInfo` is not caught for recovered frameworks.
> >     
> >     In theory, this test case shouldn't cause master coredump:
> >     ```cpp
> >       master::Flags masterFlags = CreateMasterFlags();
> >       Try<Owned<cluster::Master>> master = StartMaster(masterFlags);
> >       ASSERT_SOME(master);
> >     
> >       MockExecutor exec(DEFAULT_EXECUTOR_ID);
> >       TestContainerizer containerizer(&exec);
> >     
> >       StandaloneMasterDetector slaveDetector(master.get()->pid);
> >       Try<Owned<cluster::Slave>> slave = StartSlave(&slaveDetector, &containerizer);
> >       ASSERT_SOME(slave);
> >     
> >       FrameworkInfo frameworkInfo = DEFAULT_FRAMEWORK_INFO;
> >       frameworkInfo.set_failover_timeout(1024);
> >       frameworkInfo.set_role("role1");
> >     
> >       Future<FrameworkID> frameworkId;
> >     
> >       {
> >         MockScheduler sched;
> >         MesosSchedulerDriver driver(
> >             &sched, frameworkInfo, master.get()->pid, DEFAULT_CREDENTIAL);
> >     
> >         EXPECT_CALL(sched, registered(&driver, _, _))
> >           .WillOnce(FutureArg<1>(&frameworkId));
> >     
> >         Future<vector<Offer>> offers;
> >         EXPECT_CALL(sched, resourceOffers(&driver, _))
> >           .WillOnce(FutureArg<1>(&offers))
> >           .WillRepeatedly(Return()); // Ignore subsequent offers.
> >     
> >         driver.start();
> >     
> >         Clock::pause();
> >         Clock::settle();
> >         Clock::resume();
> >     
> >         AWAIT_READY(frameworkId);
> >     
> >         AWAIT_READY(offers);
> >         EXPECT_FALSE(offers->empty());
> >     
> >         TaskInfo task;
> >         task.set_name("");
> >         task.mutable_task_id()->set_value("1");
> >         task.mutable_slave_id()->MergeFrom(offers.get()[0].slave_id());
> >         task.mutable_resources()->MergeFrom(offers.get()[0].resources());
> >         task.mutable_executor()->MergeFrom(DEFAULT_EXECUTOR_INFO);
> >     
> >         EXPECT_CALL(exec, registered(_, _, _, _))
> >           .Times(1);
> >     
> >         EXPECT_CALL(exec, launchTask(_, _))
> >           .WillOnce(SendStatusUpdateFromTask(TASK_RUNNING));
> >     
> >         Future<TaskStatus> runningStatus;
> >         EXPECT_CALL(sched, statusUpdate(&driver, _))
> >           .WillOnce(FutureArg<1>(&runningStatus));
> >     
> >         driver.launchTasks(offers.get()[0].id(), {task});
> >     
> >         AWAIT_READY(runningStatus);
> >         EXPECT_EQ(TASK_RUNNING, runningStatus.get().state());
> >         EXPECT_EQ(task.task_id(), runningStatus.get().task_id());
> >       }
> >     
> >       slaveDetector.appoint(None());
> >     
> >       master->reset();
> >       master = StartMaster(masterFlags);
> >       ASSERT_SOME(master);
> >     
> >       Future<SlaveReregisteredMessage> slaveReregisteredMessage =
> >         FUTURE_PROTOBUF(SlaveReregisteredMessage(), _, _);
> >     
> >       slaveDetector.appoint(master.get()->pid);
> >     
> >       AWAIT_READY(slaveReregisteredMessage);
> >     
> >       frameworkInfo.mutable_id()->CopyFrom(frameworkId.get());
> >       frameworkInfo.add_roles("role2");
> >       frameworkInfo.clear_role();
> >       frameworkInfo.add_capabilities()->set_type(
> >           FrameworkInfo::Capability::MULTI_ROLE);
> >     
> >       MockScheduler sched;
> >       MesosSchedulerDriver driver(
> >           &sched, frameworkInfo, master.get()->pid, DEFAULT_CREDENTIAL);
> >     
> >       Future<Nothing> registered;
> >       EXPECT_CALL(sched, registered(&driver, _, _))
> >         .WillOnce(FutureSatisfy(&registered));
> >     
> >       driver.start();
> >     
> >       Clock::pause();
> >       Clock::settle();
> >       Clock::resume();
> >     
> >       AWAIT_READY(registered);
> >     
> >       EXPECT_CALL(exec, shutdown(_))
> >         .Times(AtMost(1));
> >     
> >       driver.stop();
> >       driver.join();
> >     ```
> 
> Benjamin Bannier wrote:
>     Thanks for spotting this, this is of course unacceptable. Also your test case was helpful. I updated the patch to allow `Master::activateRecoveredFramework` to fail.

Maybe you could add this test case as part of our test suite as well?


- Jay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/#review161827
-----------------------------------------------------------


On Jan. 18, 2017, 1:13 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55571/
> -----------------------------------------------------------
> 
> (Updated Jan. 18, 2017, 1:13 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Bugs: MESOS-6631
>     https://issues.apache.org/jira/browse/MESOS-6631
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This allows us to confirm that an updated FrameworkInfo is fully
> compatible with an already known one. By performing this check in
> updateFrameworkInfo when can be sure that both the new and old
> FrameworkInfo are available (e.g., after framework reregistration, or
> master or framework failover).
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 44f4fecb1fbe8bebf830990a59a5462338e6e004 
>   src/master/master.cpp b863ff6e93931c3d1ee056248084c7f44caf2fd9 
> 
> Diff: https://reviews.apache.org/r/55571/diff/
> 
> 
> Testing
> -------
> 
> Tested as part of https://reviews.apache.org/r/55271/.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

Posted by Jay Guo <gu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/#review161827
-----------------------------------------------------------




src/master/master.cpp (lines 7201 - 7203)
<https://reviews.apache.org/r/55571/#comment233128>

    There may be some corner cases that causes this `CHECK` to coredump. For example:
    1. Start a Mesos cluster with master, agent and a scheduler with single role `role1`.
    2. Master failover
    3. Agent re-register before scheduler does
    4. Scheduler re-subscribe with multi-role `{role2}`
    
    In this case, master re-construct FrameworkInfo from re-registered agent, which contains old role. However, in current implementation, `Error` of `updateFrameworkInfo` is not caught for recovered frameworks.
    
    In theory, this test case shouldn't cause master coredump:
    ```cpp
      master::Flags masterFlags = CreateMasterFlags();
      Try<Owned<cluster::Master>> master = StartMaster(masterFlags);
      ASSERT_SOME(master);
    
      MockExecutor exec(DEFAULT_EXECUTOR_ID);
      TestContainerizer containerizer(&exec);
    
      StandaloneMasterDetector slaveDetector(master.get()->pid);
      Try<Owned<cluster::Slave>> slave = StartSlave(&slaveDetector, &containerizer);
      ASSERT_SOME(slave);
    
      FrameworkInfo frameworkInfo = DEFAULT_FRAMEWORK_INFO;
      frameworkInfo.set_failover_timeout(1024);
      frameworkInfo.set_role("role1");
    
      Future<FrameworkID> frameworkId;
    
      {
        MockScheduler sched;
        MesosSchedulerDriver driver(
            &sched, frameworkInfo, master.get()->pid, DEFAULT_CREDENTIAL);
    
        EXPECT_CALL(sched, registered(&driver, _, _))
          .WillOnce(FutureArg<1>(&frameworkId));
    
        Future<vector<Offer>> offers;
        EXPECT_CALL(sched, resourceOffers(&driver, _))
          .WillOnce(FutureArg<1>(&offers))
          .WillRepeatedly(Return()); // Ignore subsequent offers.
    
        driver.start();
    
        Clock::pause();
        Clock::settle();
        Clock::resume();
    
        AWAIT_READY(frameworkId);
    
        AWAIT_READY(offers);
        EXPECT_FALSE(offers->empty());
    
        TaskInfo task;
        task.set_name("");
        task.mutable_task_id()->set_value("1");
        task.mutable_slave_id()->MergeFrom(offers.get()[0].slave_id());
        task.mutable_resources()->MergeFrom(offers.get()[0].resources());
        task.mutable_executor()->MergeFrom(DEFAULT_EXECUTOR_INFO);
    
        EXPECT_CALL(exec, registered(_, _, _, _))
          .Times(1);
    
        EXPECT_CALL(exec, launchTask(_, _))
          .WillOnce(SendStatusUpdateFromTask(TASK_RUNNING));
    
        Future<TaskStatus> runningStatus;
        EXPECT_CALL(sched, statusUpdate(&driver, _))
          .WillOnce(FutureArg<1>(&runningStatus));
    
        driver.launchTasks(offers.get()[0].id(), {task});
    
        AWAIT_READY(runningStatus);
        EXPECT_EQ(TASK_RUNNING, runningStatus.get().state());
        EXPECT_EQ(task.task_id(), runningStatus.get().task_id());
      }
    
      slaveDetector.appoint(None());
    
      master->reset();
      master = StartMaster(masterFlags);
      ASSERT_SOME(master);
    
      Future<SlaveReregisteredMessage> slaveReregisteredMessage =
        FUTURE_PROTOBUF(SlaveReregisteredMessage(), _, _);
    
      slaveDetector.appoint(master.get()->pid);
    
      AWAIT_READY(slaveReregisteredMessage);
    
      frameworkInfo.mutable_id()->CopyFrom(frameworkId.get());
      frameworkInfo.add_roles("role2");
      frameworkInfo.clear_role();
      frameworkInfo.add_capabilities()->set_type(
          FrameworkInfo::Capability::MULTI_ROLE);
    
      MockScheduler sched;
      MesosSchedulerDriver driver(
          &sched, frameworkInfo, master.get()->pid, DEFAULT_CREDENTIAL);
    
      Future<Nothing> registered;
      EXPECT_CALL(sched, registered(&driver, _, _))
        .WillOnce(FutureSatisfy(&registered));
    
      driver.start();
    
      Clock::pause();
      Clock::settle();
      Clock::resume();
    
      AWAIT_READY(registered);
    
      EXPECT_CALL(exec, shutdown(_))
        .Times(AtMost(1));
    
      driver.stop();
      driver.join();
    ```


- Jay Guo


On Jan. 16, 2017, 7:28 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55571/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2017, 7:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Bugs: MESOS-6631
>     https://issues.apache.org/jira/browse/MESOS-6631
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This allows us to confirm that an updated FrameworkInfo is fully
> compatible with an already known one. By performing this check in
> updateFrameworkInfo when can be sure that both the new and old
> FrameworkInfo are available (e.g., after framework reregistration, or
> master or framework failover).
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 44f4fecb1fbe8bebf830990a59a5462338e6e004 
>   src/master/master.cpp b863ff6e93931c3d1ee056248084c7f44caf2fd9 
> 
> Diff: https://reviews.apache.org/r/55571/diff/
> 
> 
> Testing
> -------
> 
> Tested as part of https://reviews.apache.org/r/55271/.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

Posted by Michael Park <mp...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/#review162534
-----------------------------------------------------------


Fix it, then Ship it!





src/master/master.cpp (lines 2580 - 2581)
<https://reviews.apache.org/r/55571/#comment233839>

    ```cpp
        Try<Nothing> updateFrameworkInfo =
          framework->updateFrameworkInfo(frameworkInfo);
    ```
    
    A few more instances of this below.


- Michael Park


On Jan. 18, 2017, 7:26 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55571/
> -----------------------------------------------------------
> 
> (Updated Jan. 18, 2017, 7:26 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Bugs: MESOS-6631
>     https://issues.apache.org/jira/browse/MESOS-6631
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This allows us to confirm that an updated FrameworkInfo is fully
> compatible with an already known one. By performing this check in
> updateFrameworkInfo when can be sure that both the new and old
> FrameworkInfo are available (e.g., after framework reregistration, or
> master or framework failover).
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 8e8a9037af11cf95961b6498540a0fd486ed091b 
>   src/master/master.cpp 73159328ce3fd838e02eba0e6a30cf69efc319ba 
> 
> Diff: https://reviews.apache.org/r/55571/diff/
> 
> 
> Testing
> -------
> 
> Tested as part of https://reviews.apache.org/r/55271/.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

Posted by Michael Park <mp...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/#review162915
-----------------------------------------------------------


Fix it, then Ship it!





src/master/master.cpp (lines 2607 - 2613)
<https://reviews.apache.org/r/55571/#comment234309>

    Is there a reason why we don't `LOG(INFO)` here and below?


- Michael Park


On Jan. 24, 2017, 4:34 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55571/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 4:34 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, Guangya Liu, and Michael Park.
> 
> 
> Bugs: MESOS-6631
>     https://issues.apache.org/jira/browse/MESOS-6631
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This allows us to confirm that an updated FrameworkInfo is fully
> compatible with an already known one. By performing this check in
> updateFrameworkInfo when can be sure that both the new and old
> FrameworkInfo are available (e.g., after framework reregistration, or
> master or framework failover).
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 044b7e697f77c9e2cefb938bda4307d328b33766 
>   include/mesos/v1/mesos.proto 7706de89d91926aedb10e4c882dffcf050f03bb1 
>   src/master/master.hpp 4bd01fd21820b7cdb6c2a23436d50e00fbbd5849 
>   src/master/master.cpp d2aee2b6e2a5da776c3b046a228d5ffcabd1bd54 
> 
> Diff: https://reviews.apache.org/r/55571/diff/
> 
> 
> Testing
> -------
> 
> Tested as part of https://reviews.apache.org/r/55271/.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/
-----------------------------------------------------------

(Updated Jan. 25, 2017, 8:33 a.m.)


Review request for mesos, Benjamin Mahler, Jay Guo, Guangya Liu, and Michael Park.


Changes
-------

Added more logging.


Bugs: MESOS-6631
    https://issues.apache.org/jira/browse/MESOS-6631


Repository: mesos


Description
-------

This allows us to confirm that an updated FrameworkInfo is fully
compatible with an already known one. By performing this check in
updateFrameworkInfo when can be sure that both the new and old
FrameworkInfo are available (e.g., after framework reregistration, or
master or framework failover).


Diffs (updated)
-----

  src/master/master.hpp 4bd01fd21820b7cdb6c2a23436d50e00fbbd5849 
  src/master/master.cpp d2aee2b6e2a5da776c3b046a228d5ffcabd1bd54 

Diff: https://reviews.apache.org/r/55571/diff/


Testing
-------

Tested as part of https://reviews.apache.org/r/55271/.


Thanks,

Benjamin Bannier


Re: Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/
-----------------------------------------------------------

(Updated Jan. 25, 2017, 1:34 a.m.)


Review request for mesos, Benjamin Mahler, Jay Guo, Guangya Liu, and Michael Park.


Changes
-------

Rebased.


Bugs: MESOS-6631
    https://issues.apache.org/jira/browse/MESOS-6631


Repository: mesos


Description
-------

This allows us to confirm that an updated FrameworkInfo is fully
compatible with an already known one. By performing this check in
updateFrameworkInfo when can be sure that both the new and old
FrameworkInfo are available (e.g., after framework reregistration, or
master or framework failover).


Diffs (updated)
-----

  include/mesos/mesos.proto 044b7e697f77c9e2cefb938bda4307d328b33766 
  include/mesos/v1/mesos.proto 7706de89d91926aedb10e4c882dffcf050f03bb1 
  src/master/master.hpp 4bd01fd21820b7cdb6c2a23436d50e00fbbd5849 
  src/master/master.cpp d2aee2b6e2a5da776c3b046a228d5ffcabd1bd54 

Diff: https://reviews.apache.org/r/55571/diff/


Testing
-------

Tested as part of https://reviews.apache.org/r/55271/.


Thanks,

Benjamin Bannier


Re: Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/
-----------------------------------------------------------

(Updated Jan. 21, 2017, 1:39 p.m.)


Review request for mesos, Benjamin Mahler, Jay Guo, Guangya Liu, and Michael Park.


Changes
-------

Fixed ws issues.


Bugs: MESOS-6631
    https://issues.apache.org/jira/browse/MESOS-6631


Repository: mesos


Description
-------

This allows us to confirm that an updated FrameworkInfo is fully
compatible with an already known one. By performing this check in
updateFrameworkInfo when can be sure that both the new and old
FrameworkInfo are available (e.g., after framework reregistration, or
master or framework failover).


Diffs (updated)
-----

  src/master/master.hpp 8e8a9037af11cf95961b6498540a0fd486ed091b 
  src/master/master.cpp 73159328ce3fd838e02eba0e6a30cf69efc319ba 

Diff: https://reviews.apache.org/r/55571/diff/


Testing
-------

Tested as part of https://reviews.apache.org/r/55271/.


Thanks,

Benjamin Bannier


Re: Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/
-----------------------------------------------------------

(Updated Jan. 18, 2017, 4:26 p.m.)


Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.


Changes
-------

Rebased.


Bugs: MESOS-6631
    https://issues.apache.org/jira/browse/MESOS-6631


Repository: mesos


Description
-------

This allows us to confirm that an updated FrameworkInfo is fully
compatible with an already known one. By performing this check in
updateFrameworkInfo when can be sure that both the new and old
FrameworkInfo are available (e.g., after framework reregistration, or
master or framework failover).


Diffs (updated)
-----

  src/master/master.hpp 8e8a9037af11cf95961b6498540a0fd486ed091b 
  src/master/master.cpp 73159328ce3fd838e02eba0e6a30cf69efc319ba 

Diff: https://reviews.apache.org/r/55571/diff/


Testing
-------

Tested as part of https://reviews.apache.org/r/55271/.


Thanks,

Benjamin Bannier


Re: Review Request 55571: Changed Master::Framework::updateFrameworkInfo so it can return errors.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55571/
-----------------------------------------------------------

(Updated Jan. 17, 2017, 6:13 p.m.)


Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.


Changes
-------

Comment from guoger: Allow Master::activateRecoveredFramework to fail.


Bugs: MESOS-6631
    https://issues.apache.org/jira/browse/MESOS-6631


Repository: mesos


Description
-------

This allows us to confirm that an updated FrameworkInfo is fully
compatible with an already known one. By performing this check in
updateFrameworkInfo when can be sure that both the new and old
FrameworkInfo are available (e.g., after framework reregistration, or
master or framework failover).


Diffs (updated)
-----

  src/master/master.hpp 44f4fecb1fbe8bebf830990a59a5462338e6e004 
  src/master/master.cpp b863ff6e93931c3d1ee056248084c7f44caf2fd9 

Diff: https://reviews.apache.org/r/55571/diff/


Testing
-------

Tested as part of https://reviews.apache.org/r/55271/.


Thanks,

Benjamin Bannier