You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Niklas Nielsen <ni...@qni.dk> on 2013/10/08 21:37:34 UTC

Review Request 14540: Task reconciliation for frameworks

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Ben Mahler.


Repository: mesos-git


Description
-------

A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
If existing tasks have changed state, a status update with the new state will be sent. 

Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.


Diffs
-----

  include/mesos/mesos.proto 957576b 
  include/mesos/scheduler.hpp cf3ecda 
  src/common/protobuf_utils.hpp 5c5f052 
  src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
  src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
  src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
  src/master/master.hpp bed051c 
  src/master/master.cpp cdfae1d 
  src/messages/messages.proto c599eb2 
  src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
  src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
  src/sched/sched.cpp c399f24 
  src/tests/master_tests.cpp 52f09d4 

Diff: https://reviews.apache.org/r/14540/diff/


Testing
-------

Tests has been provided in MasterTest.ReconcileTaskTest:

Test sends different state than current -> An update with the current state of task should be received.
Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.


Thanks,

Niklas Nielsen


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Ben Mahler <be...@gmail.com>.

> On Oct. 8, 2013, 10:36 p.m., Benjamin Hindman wrote:
> > include/mesos/scheduler.hpp, line 288
> > <https://reviews.apache.org/r/14540/diff/1/?file=362385#file362385line288>
> >
> >     s/expectedStatuses/statuses/g
> >     
> >     Since the context (doing reconciliation) is sufficient and this is the preferred style in the codebase, i.e., keep functions simple enough that we don't need to over explain variable names because we don't have that many variables. ;)

Did you forget to upload the diff? This is still expectedStatuses.


> On Oct. 8, 2013, 10:36 p.m., Benjamin Hindman wrote:
> > src/common/protobuf_utils.hpp, line 66
> > <https://reviews.apache.org/r/14540/diff/1/?file=362386#file362386line66>
> >
> >     We should also set the slave ID for updates that come directly from executors that didn't set it themselves (or possibly set it incorrectly) in Slave::statusUpdate(const StatusUpdate& update, const UPID& pid).
> 
> Niklas Nielsen wrote:
>     As the status update object is const, would it make sense to postpone setting the slave id until we have a message to send in StatusUpdateManagerProcess? 
>     Or are there other call paths were the updates ends up too?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26796
-----------------------------------------------------------


On Oct. 9, 2013, 12:09 a.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2013, 12:09 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp cdfae1d 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.

> On Oct. 8, 2013, 10:36 p.m., Benjamin Hindman wrote:
> > include/mesos/scheduler.hpp, line 288
> > <https://reviews.apache.org/r/14540/diff/1/?file=362385#file362385line288>
> >
> >     s/expectedStatuses/statuses/g
> >     
> >     Since the context (doing reconciliation) is sufficient and this is the preferred style in the codebase, i.e., keep functions simple enough that we don't need to over explain variable names because we don't have that many variables. ;)
> 
> Ben Mahler wrote:
>     Did you forget to upload the diff? This is still expectedStatuses.

Good point - forgot s/../../g in include/


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26796
-----------------------------------------------------------


On Oct. 9, 2013, 12:09 a.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2013, 12:09 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp cdfae1d 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.

> On Oct. 8, 2013, 10:36 p.m., Benjamin Hindman wrote:
> > src/common/protobuf_utils.hpp, line 66
> > <https://reviews.apache.org/r/14540/diff/1/?file=362386#file362386line66>
> >
> >     We should also set the slave ID for updates that come directly from executors that didn't set it themselves (or possibly set it incorrectly) in Slave::statusUpdate(const StatusUpdate& update, const UPID& pid).

As the status update object is const, would it make sense to postpone setting the slave id until we have a message to send in StatusUpdateManagerProcess? 
Or are there other call paths were the updates ends up too?


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26796
-----------------------------------------------------------


On Oct. 8, 2013, 7:37 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2013, 7:37 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp cdfae1d 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26796
-----------------------------------------------------------

Ship it!



include/mesos/scheduler.hpp
<https://reviews.apache.org/r/14540/#comment52146>

    s/expectedStatuses/statuses/g
    
    Since the context (doing reconciliation) is sufficient and this is the preferred style in the codebase, i.e., keep functions simple enough that we don't need to over explain variable names because we don't have that many variables. ;)



src/common/protobuf_utils.hpp
<https://reviews.apache.org/r/14540/#comment52135>

    We should also set the slave ID for updates that come directly from executors that didn't set it themselves (or possibly set it incorrectly) in Slave::statusUpdate(const StatusUpdate& update, const UPID& pid).



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52136>

    void Master::reconcileTasks(
        const FrameworkID& frameworkId,
        const std::vector<TaskStatus>& expectedStatuses)



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52147>

    Mark this as a TODO instead of a NOTE.



src/sched/sched.cpp
<https://reviews.apache.org/r/14540/#comment52138>

    We put two newlines in between "top-level" functions (i.e., not functions declared in the body of classes).



src/sched/sched.cpp
<https://reviews.apache.org/r/14540/#comment52145>

    Ditto.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/14540/#comment52137>

    We put two newlines in between tests.


- Benjamin Hindman


On Oct. 8, 2013, 7:37 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2013, 7:37 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp cdfae1d 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.

> On Oct. 10, 2013, 5:19 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1791
> > <https://reviews.apache.org/r/14540/diff/3/?file=362828#file362828line1791>
> >
> >     The more that I look at this the more I think being more explicit here (rather than in a helper in some other source file) might be better:
> >     
> >     void Slave::statusUpdate(StatusUpdate update_, const UPID& pid)
> >     {
> >       // Incoming status update might come from an executor which has not set
> >       // slave id in TaskStatus. Set/overwrite slave id.
> >       update_.mutable_status()->mutable_slave_id()->CopyFrom(id);
> >       const StatusUpdate& update = update_;
> >     
> >       LOG(INFO) << ...
> >     
> >       CHECK(state == ...
> >     }
> >     
> >     And then we can kill the helper.

We can localize removal of const:

void Slave::statusUpdate(const StatusUpdate update_, const UPID& pid)
{
  // Incoming status update might come from an executor which has not set
  // slave id in TaskStatus. Set/overwrite slave id.
  StatusUpdate mutable_update(update_);
  mutable_update.mutable_status()
    ->mutable_slave_id()->CopyFrom(update_.slave_id());
  const StatusUpdate& update(mutable_update);
  ...

Or do you prefer that we change the call sites too?


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26874
-----------------------------------------------------------


On Oct. 10, 2013, 4:52 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 4:52 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/slave/slave.cpp debb2f4 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Benjamin Hindman <be...@berkeley.edu>.

> On Oct. 10, 2013, 5:19 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1791
> > <https://reviews.apache.org/r/14540/diff/3/?file=362828#file362828line1791>
> >
> >     The more that I look at this the more I think being more explicit here (rather than in a helper in some other source file) might be better:
> >     
> >     void Slave::statusUpdate(StatusUpdate update_, const UPID& pid)
> >     {
> >       // Incoming status update might come from an executor which has not set
> >       // slave id in TaskStatus. Set/overwrite slave id.
> >       update_.mutable_status()->mutable_slave_id()->CopyFrom(id);
> >       const StatusUpdate& update = update_;
> >     
> >       LOG(INFO) << ...
> >     
> >       CHECK(state == ...
> >     }
> >     
> >     And then we can kill the helper.
> 
> Niklas Nielsen wrote:
>     We can localize removal of const:
>     
>     void Slave::statusUpdate(const StatusUpdate update_, const UPID& pid)
>     {
>       // Incoming status update might come from an executor which has not set
>       // slave id in TaskStatus. Set/overwrite slave id.
>       StatusUpdate mutable_update(update_);
>       mutable_update.mutable_status()
>         ->mutable_slave_id()->CopyFrom(update_.slave_id());
>       const StatusUpdate& update(mutable_update);
>       ...
>     
>     Or do you prefer that we change the call sites too?

I don't think we need the extra temporary and can use the "temporary" in the parameter list. We've used this style in other places in the codebase too.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26874
-----------------------------------------------------------


On Oct. 10, 2013, 4:52 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 4:52 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/slave/slave.cpp debb2f4 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26874
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/14540/#comment52273>

    The more that I look at this the more I think being more explicit here (rather than in a helper in some other source file) might be better:
    
    void Slave::statusUpdate(StatusUpdate update_, const UPID& pid)
    {
      // Incoming status update might come from an executor which has not set
      // slave id in TaskStatus. Set/overwrite slave id.
      update_.mutable_status()->mutable_slave_id()->CopyFrom(id);
      const StatusUpdate& update = update_;
    
      LOG(INFO) << ...
    
      CHECK(state == ...
    }
    
    And then we can kill the helper. 


- Benjamin Hindman


On Oct. 10, 2013, 4:52 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 4:52 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/slave/slave.cpp debb2f4 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Vinod Kone <vi...@gmail.com>.

> On Oct. 10, 2013, 8:04 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 1799-1802
> > <https://reviews.apache.org/r/14540/diff/3/?file=362828#file362828line1799>
> >
> >     Why do we do need it here now that executor driver does it?
> 
> Niklas Nielsen wrote:
>     Strange - should have been removed, but I might have missed it.
>     Is that still in the new diff? I can't seem to find it.

it's probably me. i somehow ended up looking at diff 3. ignore this comment.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26893
-----------------------------------------------------------


On Oct. 10, 2013, 8:47 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 8:47 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/exec/exec.cpp d370560 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.

> On Oct. 10, 2013, 8:04 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 1799-1802
> > <https://reviews.apache.org/r/14540/diff/3/?file=362828#file362828line1799>
> >
> >     Why do we do need it here now that executor driver does it?

Strange - should have been removed, but I might have missed it.
Is that still in the new diff? I can't seem to find it.


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26893
-----------------------------------------------------------


On Oct. 10, 2013, 8:47 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 8:47 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/exec/exec.cpp d370560 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26893
-----------------------------------------------------------

Ship it!



include/mesos/scheduler.hpp
<https://reviews.apache.org/r/14540/#comment52327>

    Sorry didn't catch this earlier.
    
    How about:
    
    "Reconciliation of tasks causes master to send status updates for tasks whose status
    differs from the status sent here."



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52328>

    LOG(WARNING) << "Unknown framework at " << from << " attempted to reconcile tasks"; 



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52329>

    A warning might be nice here.



src/slave/slave.cpp
<https://reviews.apache.org/r/14540/#comment52330>

    Why do we do need it here now that executor driver does it?



src/tests/master_tests.cpp
<https://reviews.apache.org/r/14540/#comment52331>

    Our convention has been to put it up above the TEST_F() line, but I'll leave it up to you.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/14540/#comment52332>

    camel case variable names please.


- Vinod Kone


On Oct. 10, 2013, 7:35 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 7:35 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/exec/exec.cpp d370560 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Ben Mahler <be...@gmail.com>.

> On Oct. 10, 2013, 10:26 p.m., Ben Mahler wrote:
> > src/tests/master_tests.cpp, line 1082
> > <https://reviews.apache.org/r/14540/diff/6/?file=363479#file363479line1082>
> >
> >     You can kill this as well, since we'll get a warning and we usually don't enforce at most 1. Up to you.

Scratch that, this is so that we don't get a warning if it isn't called. So leave as is :)


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26901
-----------------------------------------------------------


On Oct. 10, 2013, 8:53 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 8:53 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/exec/exec.cpp d370560 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.

> On Oct. 10, 2013, 10:26 p.m., Ben Mahler wrote:
> > src/common/protobuf_utils.hpp, line 66
> > <https://reviews.apache.org/r/14540/diff/6/?file=363468#file363468line66>
> >
> >     I think these should all be using CopyFrom (as you used elsewhere in this review), feel free to change these here or leave as is to match the rest.

I am a bit puzzled that this still shows up. It should not be included in the latest patch.


> On Oct. 10, 2013, 10:26 p.m., Ben Mahler wrote:
> > src/exec/exec.cpp, line 471
> > <https://reviews.apache.org/r/14540/diff/6/?file=363469#file363469line471>
> >
> >     In general we avoid passing non const references for a few reasons, in this case it's not immediately obvious to a caller of sendStatusUpdate() that the status update would be modified.
> >     
> >     Can you instead keep this as a const reference but add a line here:
> >     
> >     update->mutable_status()->MergeFrom(status);
> >     
> >     // Overwrite the slave id, in case the executor did not set it.
> >     update->mutable_status()->mutable_slave_id()->CopyFrom(slaveId);

Awesome - good point.


> On Oct. 10, 2013, 10:26 p.m., Ben Mahler wrote:
> > src/master/master.cpp, line 1399
> > <https://reviews.apache.org/r/14540/diff/6/?file=363474#file363474line1399>
> >
> >     is NQN your handle?

You are right - nnielsen would be the one.


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26901
-----------------------------------------------------------


On Oct. 10, 2013, 10:56 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 10:56 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/exec/exec.cpp d370560 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.

> On Oct. 10, 2013, 10:26 p.m., Ben Mahler wrote:
> > src/common/protobuf_utils.hpp, line 66
> > <https://reviews.apache.org/r/14540/diff/6/?file=363468#file363468line66>
> >
> >     I think these should all be using CopyFrom (as you used elsewhere in this review), feel free to change these here or leave as is to match the rest.
> 
> Niklas Nielsen wrote:
>     I am a bit puzzled that this still shows up. It should not be included in the latest patch.

Ah - disregard that comment. This should be there :)


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26901
-----------------------------------------------------------


On Oct. 10, 2013, 10:56 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 10:56 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/exec/exec.cpp d370560 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26901
-----------------------------------------------------------

Ship it!


Great! I should get going on the registrar integration :)


include/mesos/scheduler.hpp
<https://reviews.apache.org/r/14540/#comment52350>

    newline



include/mesos/scheduler.hpp
<https://reviews.apache.org/r/14540/#comment52345>

    s/master/the master/



src/common/protobuf_utils.hpp
<https://reviews.apache.org/r/14540/#comment52346>

    I think these should all be using CopyFrom (as you used elsewhere in this review), feel free to change these here or leave as is to match the rest.



src/exec/exec.cpp
<https://reviews.apache.org/r/14540/#comment52349>

    In general we avoid passing non const references for a few reasons, in this case it's not immediately obvious to a caller of sendStatusUpdate() that the status update would be modified.
    
    Can you instead keep this as a const reference but add a line here:
    
    update->mutable_status()->MergeFrom(status);
    
    // Overwrite the slave id, in case the executor did not set it.
    update->mutable_status()->mutable_slave_id()->CopyFrom(slaveId);



src/exec/exec.cpp
<https://reviews.apache.org/r/14540/#comment52347>

    Feel free to update the others here to use CopyFrom as opposed to MergeFrom. Up to you.



src/java/src/org/apache/mesos/SchedulerDriver.java
<https://reviews.apache.org/r/14540/#comment52351>

    Would be nice if the documentation was the same as in scheduler.hpp file. Not sure if we've always done this but we should! :)



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52355>

    Including the framework id would be nice.
    
    s/framework/framework " << frameworkId << "/



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52361>

    A log message would be nice here:
    
    LOG(INFO) << "Performing task state reconciliation for framework " << frameworkId;



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52356>

    when:



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52357>

    end these with periods :)



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52358>

    is NQN your handle?



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52359>

    "the registrar"



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52360>

    Can you print the framework id and the task id? The task id is not unique across frameworks.



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52362>

    Any reason not to place on one line?
    
    if (task != NULL && task->state() != status.state()) {



src/python/native/mesos_scheduler_driver_impl.cpp
<https://reviews.apache.org/r/14540/#comment52363>

    It's too bad our python documentation is different than our java / c++ documentation :(



src/tests/master_tests.cpp
<https://reviews.apache.org/r/14540/#comment52364>

    kill newline



src/tests/master_tests.cpp
<https://reviews.apache.org/r/14540/#comment52365>

    You can kill this as well, since we'll get a warning and we usually don't enforce at most 1. Up to you.


- Ben Mahler


On Oct. 10, 2013, 8:53 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 8:53 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/exec/exec.cpp d370560 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26962
-----------------------------------------------------------

Ship it!


I've just committed this! Thanks Niklas!

- Benjamin Hindman


On Oct. 10, 2013, 10:56 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 10:56 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/exec/exec.cpp d370560 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/
-----------------------------------------------------------

(Updated Oct. 10, 2013, 10:56 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

Additional style fixes, logging and reintroduction of constness in StatusUpdate parameter


Repository: mesos-git


Description
-------

A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
If existing tasks have changed state, a status update with the new state will be sent. 

Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.


Diffs (updated)
-----

  include/mesos/mesos.proto 957576b 
  include/mesos/scheduler.hpp cf3ecda 
  src/common/protobuf_utils.hpp 5c5f052 
  src/exec/exec.cpp d370560 
  src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
  src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
  src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
  src/master/master.hpp bed051c 
  src/master/master.cpp 2fd48a6 
  src/messages/messages.proto c599eb2 
  src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
  src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
  src/sched/sched.cpp c399f24 
  src/tests/master_tests.cpp 52f09d4 

Diff: https://reviews.apache.org/r/14540/diff/


Testing
-------

Tests has been provided in MasterTest.ReconcileTaskTest:

Test sends different state than current -> An update with the current state of task should be received.
Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.


Thanks,

Niklas Nielsen


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/
-----------------------------------------------------------

(Updated Oct. 10, 2013, 8:53 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

Please disregard previous patch - task_id and slave_id in test were not consistently renamed and breaks the build.
This patch includes the proper renaming.


Repository: mesos-git


Description
-------

A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
If existing tasks have changed state, a status update with the new state will be sent. 

Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.


Diffs (updated)
-----

  include/mesos/mesos.proto 957576b 
  include/mesos/scheduler.hpp cf3ecda 
  src/common/protobuf_utils.hpp 5c5f052 
  src/exec/exec.cpp d370560 
  src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
  src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
  src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
  src/master/master.hpp bed051c 
  src/master/master.cpp 2fd48a6 
  src/messages/messages.proto c599eb2 
  src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
  src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
  src/sched/sched.cpp c399f24 
  src/tests/master_tests.cpp 52f09d4 

Diff: https://reviews.apache.org/r/14540/diff/


Testing
-------

Tests has been provided in MasterTest.ReconcileTaskTest:

Test sends different state than current -> An update with the current state of task should be received.
Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.


Thanks,

Niklas Nielsen


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/
-----------------------------------------------------------

(Updated Oct. 10, 2013, 8:47 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

Further style fixes and logging


Repository: mesos-git


Description
-------

A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
If existing tasks have changed state, a status update with the new state will be sent. 

Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.


Diffs (updated)
-----

  include/mesos/mesos.proto 957576b 
  include/mesos/scheduler.hpp cf3ecda 
  src/common/protobuf_utils.hpp 5c5f052 
  src/exec/exec.cpp d370560 
  src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
  src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
  src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
  src/master/master.hpp bed051c 
  src/master/master.cpp 2fd48a6 
  src/messages/messages.proto c599eb2 
  src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
  src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
  src/sched/sched.cpp c399f24 
  src/tests/master_tests.cpp 52f09d4 

Diff: https://reviews.apache.org/r/14540/diff/


Testing
-------

Tests has been provided in MasterTest.ReconcileTaskTest:

Test sends different state than current -> An update with the current state of task should be received.
Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.


Thanks,

Niklas Nielsen


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/
-----------------------------------------------------------

(Updated Oct. 10, 2013, 7:35 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

Addresses style issues, test boiler plate and moves "Setting slave id in TaskStatus" to exec.cpp.


Repository: mesos-git


Description
-------

A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
If existing tasks have changed state, a status update with the new state will be sent. 

Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.


Diffs (updated)
-----

  include/mesos/mesos.proto 957576b 
  include/mesos/scheduler.hpp cf3ecda 
  src/common/protobuf_utils.hpp 5c5f052 
  src/exec/exec.cpp d370560 
  src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
  src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
  src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
  src/master/master.hpp bed051c 
  src/master/master.cpp 2fd48a6 
  src/messages/messages.proto c599eb2 
  src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
  src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
  src/sched/sched.cpp c399f24 
  src/tests/master_tests.cpp 52f09d4 

Diff: https://reviews.apache.org/r/14540/diff/


Testing
-------

Tests has been provided in MasterTest.ReconcileTaskTest:

Test sends different state than current -> An update with the current state of task should be received.
Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.


Thanks,

Niklas Nielsen


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Vinod Kone <vi...@gmail.com>.

> On Oct. 10, 2013, 6:09 p.m., Vinod Kone wrote:
> > include/mesos/scheduler.hpp, line 287
> > <https://reviews.apache.org/r/14540/diff/3/?file=362817#file362817line287>
> >
> >     s/reconcileTasks/reconcile/ ? This way we could make it more generic in the future?
> 
> Benjamin Hindman wrote:
>     I'm a big +1 for moving to 'reconcile' over 'reconcileTasks', but that hasn't been the naming convention thus far (e.g., requestResources, launchTasks, killTask, declineOffer, etc). Switching to the new naming convention now will likely make using a future API easier but might also seem a bit out of place.

sgtm.


> On Oct. 10, 2013, 6:09 p.m., Vinod Kone wrote:
> > src/master/master.cpp, lines 1387-1390
> > <https://reviews.apache.org/r/14540/diff/3/?file=362823#file362823line1387>
> >
> >     This seems a bit harsh. What if this is a failed over master?
> >     
> >     Can we just log a warning and ignore it instead?
> 
> Benjamin Hindman wrote:
>     Can a failed over master get this message?

Yes. If the leader fails over and becomes leader again before the scheduler driver realizes it (link break or zk detector message) this could happen.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26873
-----------------------------------------------------------


On Oct. 10, 2013, 4:52 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 4:52 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/slave/slave.cpp debb2f4 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Benjamin Hindman <be...@berkeley.edu>.

> On Oct. 10, 2013, 6:09 p.m., Vinod Kone wrote:
> > src/master/master.cpp, lines 1387-1390
> > <https://reviews.apache.org/r/14540/diff/3/?file=362823#file362823line1387>
> >
> >     This seems a bit harsh. What if this is a failed over master?
> >     
> >     Can we just log a warning and ignore it instead?
> 
> Benjamin Hindman wrote:
>     Can a failed over master get this message?
> 
> Vinod Kone wrote:
>     Yes. If the leader fails over and becomes leader again before the scheduler driver realizes it (link break or zk detector message) this could happen.

Ack, let's just print a warning and return for now.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26873
-----------------------------------------------------------


On Oct. 10, 2013, 4:52 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 4:52 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/slave/slave.cpp debb2f4 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.

> On Oct. 10, 2013, 6:09 p.m., Vinod Kone wrote:
> > include/mesos/scheduler.hpp, line 287
> > <https://reviews.apache.org/r/14540/diff/3/?file=362817#file362817line287>
> >
> >     s/reconcileTasks/reconcile/ ? This way we could make it more generic in the future?
> 
> Benjamin Hindman wrote:
>     I'm a big +1 for moving to 'reconcile' over 'reconcileTasks', but that hasn't been the naming convention thus far (e.g., requestResources, launchTasks, killTask, declineOffer, etc). Switching to the new naming convention now will likely make using a future API easier but might also seem a bit out of place.
> 
> Vinod Kone wrote:
>     sgtm.

So should we go for reconcile or reconcileTasks?

If 'reconcile' - should we just overload the existing Master::reconcile() or rename it?


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26873
-----------------------------------------------------------


On Oct. 10, 2013, 4:52 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 4:52 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/slave/slave.cpp debb2f4 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Benjamin Hindman <be...@berkeley.edu>.

> On Oct. 10, 2013, 6:09 p.m., Vinod Kone wrote:
> > include/mesos/scheduler.hpp, line 287
> > <https://reviews.apache.org/r/14540/diff/3/?file=362817#file362817line287>
> >
> >     s/reconcileTasks/reconcile/ ? This way we could make it more generic in the future?

I'm a big +1 for moving to 'reconcile' over 'reconcileTasks', but that hasn't been the naming convention thus far (e.g., requestResources, launchTasks, killTask, declineOffer, etc). Switching to the new naming convention now will likely make using a future API easier but might also seem a bit out of place.


> On Oct. 10, 2013, 6:09 p.m., Vinod Kone wrote:
> > src/master/master.cpp, lines 1387-1390
> > <https://reviews.apache.org/r/14540/diff/3/?file=362823#file362823line1387>
> >
> >     This seems a bit harsh. What if this is a failed over master?
> >     
> >     Can we just log a warning and ignore it instead?

Can a failed over master get this message?


> On Oct. 10, 2013, 6:09 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 1799-1802
> > <https://reviews.apache.org/r/14540/diff/3/?file=362828#file362828line1799>
> >
> >     Instead of doing it here, how about doing it in executor driver (exec.cpp) ?

SGTM.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26873
-----------------------------------------------------------


On Oct. 10, 2013, 4:52 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 4:52 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/slave/slave.cpp debb2f4 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.

> On Oct. 10, 2013, 6:09 p.m., Vinod Kone wrote:
> >

Great comments Vinod!


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26873
-----------------------------------------------------------


On Oct. 10, 2013, 7:35 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 7:35 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/exec/exec.cpp d370560 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/#review26873
-----------------------------------------------------------



include/mesos/scheduler.hpp
<https://reviews.apache.org/r/14540/#comment52270>

    s/cause/causes/



include/mesos/scheduler.hpp
<https://reviews.apache.org/r/14540/#comment52269>

    s/reconcileTasks/reconcile/ ? This way we could make it more generic in the future?



include/mesos/scheduler.hpp
<https://reviews.apache.org/r/14540/#comment52271>

    ditto.



src/common/protobuf_utils.hpp
<https://reviews.apache.org/r/14540/#comment52274>

    This function name is a bit weird. AFAICT, this is only used once in master.cpp. It is probably not worth to pull this out as a helper yet.



src/common/protobuf_utils.hpp
<https://reviews.apache.org/r/14540/#comment52272>

    The convention has been to use camelCase for variables.
    
    How about 
    s/update/_update/
    s/mutable_update/update/
    
    ?



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52275>

    This seems a bit harsh. What if this is a failed over master?
    
    Can we just log a warning and ignore it instead?



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52279>

    s/down/unknown/



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52278>

    s/down/unknown/ ?



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52276>

    s/have/has/



src/master/master.cpp
<https://reviews.apache.org/r/14540/#comment52277>

    Don't you want to check that status has slave_id set? These are sent by the scheduler so there are no guarantees.



src/messages/messages.proto
<https://reviews.apache.org/r/14540/#comment52280>

    new line.



src/python/native/mesos_scheduler_driver_impl.cpp
<https://reviews.apache.org/r/14540/#comment52283>

    How about 
    
    "Master sends status updates if task status is different from last known state"



src/sched/sched.cpp
<https://reviews.apache.org/r/14540/#comment52284>

    Can you add
    
    if (!connected) {
      VLOG(1) << "Ignoring....;
      return;
    }



src/slave/slave.cpp
<https://reviews.apache.org/r/14540/#comment52285>

    Instead of doing it here, how about doing it in executor driver (exec.cpp) ?



src/tests/master_tests.cpp
<https://reviews.apache.org/r/14540/#comment52286>

    kill this. Times(1) is the default.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/14540/#comment52287>

    Consider using LaunchTasks action to eliminate a lot of this boiler plate.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/14540/#comment52289>

    kill.



src/tests/master_tests.cpp
<https://reviews.apache.org/r/14540/#comment52291>

    Can you pull these TODOs to the top of this test?



src/tests/master_tests.cpp
<https://reviews.apache.org/r/14540/#comment52292>

    Why this scoping?


- Vinod Kone


On Oct. 10, 2013, 4:52 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14540/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2013, 4:52 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
> If existing tasks have changed state, a status update with the new state will be sent. 
> 
> Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 957576b 
>   include/mesos/scheduler.hpp cf3ecda 
>   src/common/protobuf_utils.hpp 5c5f052 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
>   src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
>   src/master/master.hpp bed051c 
>   src/master/master.cpp 2fd48a6 
>   src/messages/messages.proto c599eb2 
>   src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
>   src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
>   src/sched/sched.cpp c399f24 
>   src/slave/slave.cpp debb2f4 
>   src/tests/master_tests.cpp 52f09d4 
> 
> Diff: https://reviews.apache.org/r/14540/diff/
> 
> 
> Testing
> -------
> 
> Tests has been provided in MasterTest.ReconcileTaskTest:
> 
> Test sends different state than current -> An update with the current state of task should be received.
> Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/
-----------------------------------------------------------

(Updated Oct. 10, 2013, 4:52 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

Added Vinod Kone as a reviewer.


Repository: mesos-git


Description
-------

A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
If existing tasks have changed state, a status update with the new state will be sent. 

Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.


Diffs
-----

  include/mesos/mesos.proto 957576b 
  include/mesos/scheduler.hpp cf3ecda 
  src/common/protobuf_utils.hpp 5c5f052 
  src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
  src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
  src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
  src/master/master.hpp bed051c 
  src/master/master.cpp 2fd48a6 
  src/messages/messages.proto c599eb2 
  src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
  src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
  src/sched/sched.cpp c399f24 
  src/slave/slave.cpp debb2f4 
  src/tests/master_tests.cpp 52f09d4 

Diff: https://reviews.apache.org/r/14540/diff/


Testing
-------

Tests has been provided in MasterTest.ReconcileTaskTest:

Test sends different state than current -> An update with the current state of task should be received.
Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.


Thanks,

Niklas Nielsen


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/
-----------------------------------------------------------

(Updated Oct. 9, 2013, 7:43 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Last ExpectedStatuses and expected_statuses removed.

Ensure set slave_id in TaskStatus during Slave::statusUpdate()


Repository: mesos-git


Description
-------

A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
If existing tasks have changed state, a status update with the new state will be sent. 

Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.


Diffs (updated)
-----

  include/mesos/mesos.proto 957576b 
  include/mesos/scheduler.hpp cf3ecda 
  src/common/protobuf_utils.hpp 5c5f052 
  src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
  src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
  src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
  src/master/master.hpp bed051c 
  src/master/master.cpp 2fd48a6 
  src/messages/messages.proto c599eb2 
  src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
  src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
  src/sched/sched.cpp c399f24 
  src/slave/slave.cpp debb2f4 
  src/tests/master_tests.cpp 52f09d4 

Diff: https://reviews.apache.org/r/14540/diff/


Testing
-------

Tests has been provided in MasterTest.ReconcileTaskTest:

Test sends different state than current -> An update with the current state of task should be received.
Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.


Thanks,

Niklas Nielsen


Re: Review Request 14540: Task reconciliation for frameworks

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14540/
-----------------------------------------------------------

(Updated Oct. 9, 2013, 12:09 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Style fixes


Repository: mesos-git


Description
-------

A framework can reconcile tasks in connection with a master fail-over by sending it's "last known state" of its tasks and confirm or reestablish the current state of the setup.
If existing tasks have changed state, a status update with the new state will be sent. 

Soon, we may be able to reliably tell if any slave or task is no longer present. A status update with TASK_LOST should be sent to the framework.


Diffs (updated)
-----

  include/mesos/mesos.proto 957576b 
  include/mesos/scheduler.hpp cf3ecda 
  src/common/protobuf_utils.hpp 5c5f052 
  src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 6d2a03b 
  src/java/src/org/apache/mesos/MesosSchedulerDriver.java 7ef1fe7 
  src/java/src/org/apache/mesos/SchedulerDriver.java c806a55 
  src/master/master.hpp bed051c 
  src/master/master.cpp cdfae1d 
  src/messages/messages.proto c599eb2 
  src/python/native/mesos_scheduler_driver_impl.hpp 83fdc18 
  src/python/native/mesos_scheduler_driver_impl.cpp f25d41d 
  src/sched/sched.cpp c399f24 
  src/tests/master_tests.cpp 52f09d4 

Diff: https://reviews.apache.org/r/14540/diff/


Testing
-------

Tests has been provided in MasterTest.ReconcileTaskTest:

Test sends different state than current -> An update with the current state of task should be received.
Stubs have been left for future test, where test sends expected state of non-existing task -> An update with TASK_LOST should be received.


Thanks,

Niklas Nielsen