You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2013/05/18 02:10:57 UTC

Review Request: Fixed master to refactor tasks reconciliation when slave re-registers.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11229/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Ben Mahler.


Description
-------

See summary.


Diffs
-----

  src/master/master.hpp d3790dc2e8b2446736e8b7bcb39c5df54a791237 
  src/master/master.cpp c44f2b79eb2b2494b4cf3478f2566f3c117b2f83 

Diff: https://reviews.apache.org/r/11229/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Fixed master to properly do task reconciliation when slave re-registers.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11229/
-----------------------------------------------------------

(Updated June 7, 2013, 10:05 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased for posterity. NNFR.


Description
-------

Reopened the discarded review (includes refactor of task reconciliation) to fix task reconciliation.


This addresses bug MESOS-447.
    https://issues.apache.org/jira/browse/MESOS-447


Diffs (updated)
-----

  src/master/master.hpp 8e7b74cccf46bea8dd5cb8543bf083aed64d370a 
  src/master/master.cpp a2e4b905f1ef5c00560917c133b27ac978988807 
  src/slave/slave.hpp 26dc96e5f2fdc0711fc49a9ea80b7f037bf93c29 
  src/slave/slave.cpp 8ce1646f2f804bc8dc20506d11078191f0274654 
  src/tests/fault_tolerance_tests.cpp ef570b7e4b61df5e452628a5ea7c75888a0797ec 

Diff: https://reviews.apache.org/r/11229/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Fixed master to properly do task reconciliation when slave re-registers.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11229/
-----------------------------------------------------------

(Updated June 7, 2013, 8:37 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benh's.


Description
-------

Reopened the discarded review (includes refactor of task reconciliation) to fix task reconciliation.


This addresses bug MESOS-447.
    https://issues.apache.org/jira/browse/MESOS-447


Diffs (updated)
-----

  src/master/master.hpp 8e7b74cccf46bea8dd5cb8543bf083aed64d370a 
  src/master/master.cpp a2e4b905f1ef5c00560917c133b27ac978988807 
  src/slave/slave.hpp 26dc96e5f2fdc0711fc49a9ea80b7f037bf93c29 
  src/slave/slave.cpp 8ce1646f2f804bc8dc20506d11078191f0274654 
  src/tests/fault_tolerance_tests.cpp ef570b7e4b61df5e452628a5ea7c75888a0797ec 

Diff: https://reviews.apache.org/r/11229/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Fixed master to properly do task reconciliation when slave re-registers.

Posted by Vinod Kone <vi...@gmail.com>.

> On June 7, 2013, 5:50 a.m., Benjamin Hindman wrote:
> > src/master/master.cpp, line 1655
> > <https://reviews.apache.org/r/11229/diff/3/?file=301601#file301601line1655>
> >
> >     I'm not sure I understand fully why we need to filter 'tasks' into 'slaveTasks'. Can't we just do the 'foreachvalue (Task* task, utils::copy(slave->tasks))' below and have a nested for loop which looks in 'tasks' to see if the slave knows about it?

that would be a O(n^2) operation instead of O(n). also, I'm not filtering, as much as converting a vector to map. am I missing something?


> On June 7, 2013, 5:50 a.m., Benjamin Hindman wrote:
> > src/tests/fault_tolerance_tests.cpp, line 1456
> > <https://reviews.apache.org/r/11229/diff/3/?file=301604#file301604line1456>
> >
> >     s/PendingUpdates/IncompleteTasks/? Just going off your naming above ...

thank you. forgot to change the name after we changed the semantics in the slave.


> On June 7, 2013, 5:50 a.m., Benjamin Hindman wrote:
> > src/tests/fault_tolerance_tests.cpp, lines 1525-1527
> > <https://reviews.apache.org/r/11229/diff/3/?file=301604#file301604line1525>
> >
> >     To really be sure about this you'll want to pause before you post the NewMasterDetectedMessage.

N/A since the retried TASK_FINISHED ensures the receipt of TASK_LOST, if any.


> On June 7, 2013, 5:50 a.m., Benjamin Hindman wrote:
> > src/tests/fault_tolerance_tests.cpp, line 1501
> > <https://reviews.apache.org/r/11229/diff/3/?file=301604#file301604line1501>
> >
> >     Ideally we just drop the one, and the test later shows the TASK_FINISHED makes its way through ...

done


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11229/#review21561
-----------------------------------------------------------


On June 7, 2013, 8:37 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11229/
> -----------------------------------------------------------
> 
> (Updated June 7, 2013, 8:37 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Reopened the discarded review (includes refactor of task reconciliation) to fix task reconciliation.
> 
> 
> This addresses bug MESOS-447.
>     https://issues.apache.org/jira/browse/MESOS-447
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 8e7b74cccf46bea8dd5cb8543bf083aed64d370a 
>   src/master/master.cpp a2e4b905f1ef5c00560917c133b27ac978988807 
>   src/slave/slave.hpp 26dc96e5f2fdc0711fc49a9ea80b7f037bf93c29 
>   src/slave/slave.cpp 8ce1646f2f804bc8dc20506d11078191f0274654 
>   src/tests/fault_tolerance_tests.cpp ef570b7e4b61df5e452628a5ea7c75888a0797ec 
> 
> Diff: https://reviews.apache.org/r/11229/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fixed master to properly do task reconciliation when slave re-registers.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11229/#review21561
-----------------------------------------------------------

Ship it!


IIUC we can simplify Master::reconcileTasks, and I'd love to see that done before this is committed. I think we could also improve the test by pausing the clock sooner, doing everything else the test is doing, and then advancing time and actually seeing the TASK_FINISHED eventually make it's way through. That would be awesome.


src/master/master.cpp
<https://reviews.apache.org/r/11229/#comment44606>

    s/Consolidate/Reconcile/



src/master/master.cpp
<https://reviews.apache.org/r/11229/#comment44605>

    I'm not sure I understand fully why we need to filter 'tasks' into 'slaveTasks'. Can't we just do the 'foreachvalue (Task* task, utils::copy(slave->tasks))' below and have a nested for loop which looks in 'tasks' to see if the slave knows about it?



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/11229/#comment44607>

    s/PendingUpdates/IncompleteTasks/? Just going off your naming above ...



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/11229/#comment44608>

    Kill this line.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/11229/#comment44611>

    Ideally we just drop the one, and the test later shows the TASK_FINISHED makes its way through ...



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/11229/#comment44609>

    s/statusUpdate =/_statusUpdate =/



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/11229/#comment44610>

    Do you want to move this up before you do driver.launchTasks? I understand that you want to imply that when the slave reregisters a TASK_LOST will not make it's way to the scheduler, but it seems like we also want to capture that driver.launchTasks does not in fact cause statusUpdate to get invoked on the scheduler either.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/11229/#comment44612>

    To really be sure about this you'll want to pause before you post the NewMasterDetectedMessage.


- Benjamin Hindman


On June 6, 2013, 8:40 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11229/
> -----------------------------------------------------------
> 
> (Updated June 6, 2013, 8:40 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Reopened the discarded review (includes refactor of task reconciliation) to fix task reconciliation.
> 
> 
> This addresses bug MESOS-447.
>     https://issues.apache.org/jira/browse/MESOS-447
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 8e7b74cccf46bea8dd5cb8543bf083aed64d370a 
>   src/master/master.cpp a2e4b905f1ef5c00560917c133b27ac978988807 
>   src/slave/slave.hpp 26dc96e5f2fdc0711fc49a9ea80b7f037bf93c29 
>   src/slave/slave.cpp 8ce1646f2f804bc8dc20506d11078191f0274654 
>   src/tests/fault_tolerance_tests.cpp ef570b7e4b61df5e452628a5ea7c75888a0797ec 
> 
> Diff: https://reviews.apache.org/r/11229/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fixed master to properly do task reconciliation when slave re-registers.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11229/
-----------------------------------------------------------

(Updated June 6, 2013, 8:40 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

slave no longer need to send updates when re-registering.


Description
-------

Reopened the discarded review (includes refactor of task reconciliation) to fix task reconciliation.


This addresses bug MESOS-447.
    https://issues.apache.org/jira/browse/MESOS-447


Diffs (updated)
-----

  src/master/master.hpp 8e7b74cccf46bea8dd5cb8543bf083aed64d370a 
  src/master/master.cpp a2e4b905f1ef5c00560917c133b27ac978988807 
  src/slave/slave.hpp 26dc96e5f2fdc0711fc49a9ea80b7f037bf93c29 
  src/slave/slave.cpp 8ce1646f2f804bc8dc20506d11078191f0274654 
  src/tests/fault_tolerance_tests.cpp ef570b7e4b61df5e452628a5ea7c75888a0797ec 

Diff: https://reviews.apache.org/r/11229/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Fixed master to properly do task reconciliation when slave re-registers.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11229/
-----------------------------------------------------------

(Updated June 4, 2013, 9:28 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

added bug id.


Description
-------

Reopened the discarded review (includes refactor of task reconciliation) to fix task reconciliation.


This addresses bug MESOS-447.
    https://issues.apache.org/jira/browse/MESOS-447


Diffs
-----

  src/master/master.hpp 8e7b74cccf46bea8dd5cb8543bf083aed64d370a 
  src/master/master.cpp a2e4b905f1ef5c00560917c133b27ac978988807 
  src/messages/messages.proto 2c196eee0f3adba90799878b7e126bc85f635b65 
  src/slave/slave.cpp e905ab3cd965002592c1f70c85ea3378ac8982b0 
  src/tests/fault_tolerance_tests.cpp c8fad7c02abcafa8ee9351a8b40684dba7f2c711 

Diff: https://reviews.apache.org/r/11229/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Fixed master to properly do task reconciliation when slave re-registers.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11229/
-----------------------------------------------------------

(Updated June 4, 2013, 9:28 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Summary (updated)
-----------------

Fixed master to properly do task reconciliation when slave re-registers.


Description (updated)
-------

Reopened the discarded review (includes refactor of task reconciliation) to fix task reconciliation.


Diffs (updated)
-----

  src/master/master.hpp 8e7b74cccf46bea8dd5cb8543bf083aed64d370a 
  src/master/master.cpp a2e4b905f1ef5c00560917c133b27ac978988807 
  src/messages/messages.proto 2c196eee0f3adba90799878b7e126bc85f635b65 
  src/slave/slave.cpp e905ab3cd965002592c1f70c85ea3378ac8982b0 
  src/tests/fault_tolerance_tests.cpp c8fad7c02abcafa8ee9351a8b40684dba7f2c711 

Diff: https://reviews.apache.org/r/11229/diff/


Testing
-------

make check


Thanks,

Vinod Kone