You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/10/02 02:25:14 UTC

Review Request 14435: Fixed a bug with resource accounting in reconciliation.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14435/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman, Thomas Marshall, and Vinod Kone.


Bugs: MESOS-711
    https://issues.apache.org/jira/browse/MESOS-711


Repository: mesos-git


Description
-------

This adds the following two calls to the Allocator:
  slaveDisconnected(SlaveID)
  slaveReconnected(SlaveID)

Which allows the allocator to maintain Slave state when checkpointing slaves disconnect. This is in contrast to the existing code which completely removes disconnected checkpointing slaves from the allocator and re-adds them upon reconnection. However, this is complex to get correct, and ultimately had bugs: MESOS-711.

I've included a test in the subsequent change in this chain.


Diffs
-----

  src/master/allocator.hpp 46a7370d589d939ca64d645879a09221b8491569 
  src/master/hierarchical_allocator_process.hpp 183b205663e82aa53bb3898cac0b1be10b9a4f3f 
  src/master/master.cpp a49b17ef43fca5b385a89731ca8776a26b61399a 
  src/tests/mesos.hpp 8fbd56c8dd438a08673b54630bfe25d60ad5ee0e 

Diff: https://reviews.apache.org/r/14435/diff/


Testing
-------

make check

Added a test in the subsequent review.


Thanks,

Ben Mahler


Re: Review Request 14435: Fixed a bug with resource accounting in reconciliation.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14435/
-----------------------------------------------------------

(Updated Oct. 2, 2013, 6:57 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Killed a stale comment.


Bugs: MESOS-711
    https://issues.apache.org/jira/browse/MESOS-711


Repository: mesos-git


Description
-------

This adds the following two calls to the Allocator:
  slaveDisconnected(SlaveID)
  slaveReconnected(SlaveID)

Which allows the allocator to maintain Slave state when checkpointing slaves disconnect. This is in contrast to the existing code which completely removes disconnected checkpointing slaves from the allocator and re-adds them upon reconnection. However, this is complex to get correct, and ultimately had bugs: MESOS-711.

I've included a test in the subsequent change in this chain.


Diffs (updated)
-----

  src/master/allocator.hpp 46a7370d589d939ca64d645879a09221b8491569 
  src/master/hierarchical_allocator_process.hpp 183b205663e82aa53bb3898cac0b1be10b9a4f3f 
  src/master/master.cpp a49b17ef43fca5b385a89731ca8776a26b61399a 
  src/tests/mesos.hpp 8fbd56c8dd438a08673b54630bfe25d60ad5ee0e 

Diff: https://reviews.apache.org/r/14435/diff/


Testing
-------

make check

Added a test in the subsequent review.


Thanks,

Ben Mahler


Re: Review Request 14435: Fixed a bug with resource accounting in reconciliation.

Posted by Ben Mahler <be...@gmail.com>.

> On Oct. 2, 2013, 8:02 a.m., Vinod Kone wrote:
> > src/master/master.cpp, line 1173
> > <https://reviews.apache.org/r/14435/diff/1-2/?file=360216#file360216line1173>
> >
> >     Kill?

Thanks!


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14435/#review26601
-----------------------------------------------------------


On Oct. 2, 2013, 5:45 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14435/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2013, 5:45 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Bugs: MESOS-711
>     https://issues.apache.org/jira/browse/MESOS-711
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This adds the following two calls to the Allocator:
>   slaveDisconnected(SlaveID)
>   slaveReconnected(SlaveID)
> 
> Which allows the allocator to maintain Slave state when checkpointing slaves disconnect. This is in contrast to the existing code which completely removes disconnected checkpointing slaves from the allocator and re-adds them upon reconnection. However, this is complex to get correct, and ultimately had bugs: MESOS-711.
> 
> I've included a test in the subsequent change in this chain.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator.hpp 46a7370d589d939ca64d645879a09221b8491569 
>   src/master/hierarchical_allocator_process.hpp 183b205663e82aa53bb3898cac0b1be10b9a4f3f 
>   src/master/master.cpp a49b17ef43fca5b385a89731ca8776a26b61399a 
>   src/tests/mesos.hpp 8fbd56c8dd438a08673b54630bfe25d60ad5ee0e 
> 
> Diff: https://reviews.apache.org/r/14435/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added a test in the subsequent review.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request 14435: Fixed a bug with resource accounting in reconciliation.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14435/#review26601
-----------------------------------------------------------



src/master/master.cpp
<https://reviews.apache.org/r/14435/#comment51845>

    Kill?


- Vinod Kone


On Oct. 2, 2013, 5:45 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14435/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2013, 5:45 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Bugs: MESOS-711
>     https://issues.apache.org/jira/browse/MESOS-711
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This adds the following two calls to the Allocator:
>   slaveDisconnected(SlaveID)
>   slaveReconnected(SlaveID)
> 
> Which allows the allocator to maintain Slave state when checkpointing slaves disconnect. This is in contrast to the existing code which completely removes disconnected checkpointing slaves from the allocator and re-adds them upon reconnection. However, this is complex to get correct, and ultimately had bugs: MESOS-711.
> 
> I've included a test in the subsequent change in this chain.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator.hpp 46a7370d589d939ca64d645879a09221b8491569 
>   src/master/hierarchical_allocator_process.hpp 183b205663e82aa53bb3898cac0b1be10b9a4f3f 
>   src/master/master.cpp a49b17ef43fca5b385a89731ca8776a26b61399a 
>   src/tests/mesos.hpp 8fbd56c8dd438a08673b54630bfe25d60ad5ee0e 
> 
> Diff: https://reviews.apache.org/r/14435/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added a test in the subsequent review.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request 14435: Fixed a bug with resource accounting in reconciliation.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14435/
-----------------------------------------------------------

(Updated Oct. 2, 2013, 5:45 a.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Vinod's review.


Bugs: MESOS-711
    https://issues.apache.org/jira/browse/MESOS-711


Repository: mesos-git


Description
-------

This adds the following two calls to the Allocator:
  slaveDisconnected(SlaveID)
  slaveReconnected(SlaveID)

Which allows the allocator to maintain Slave state when checkpointing slaves disconnect. This is in contrast to the existing code which completely removes disconnected checkpointing slaves from the allocator and re-adds them upon reconnection. However, this is complex to get correct, and ultimately had bugs: MESOS-711.

I've included a test in the subsequent change in this chain.


Diffs (updated)
-----

  src/master/allocator.hpp 46a7370d589d939ca64d645879a09221b8491569 
  src/master/hierarchical_allocator_process.hpp 183b205663e82aa53bb3898cac0b1be10b9a4f3f 
  src/master/master.cpp a49b17ef43fca5b385a89731ca8776a26b61399a 
  src/tests/mesos.hpp 8fbd56c8dd438a08673b54630bfe25d60ad5ee0e 

Diff: https://reviews.apache.org/r/14435/diff/


Testing
-------

make check

Added a test in the subsequent review.


Thanks,

Ben Mahler


Re: Review Request 14435: Fixed a bug with resource accounting in reconciliation.

Posted by Ben Mahler <be...@gmail.com>.

> On Oct. 2, 2013, 1 a.m., Vinod Kone wrote:
> > src/master/master.cpp, line 1140
> > <https://reviews.apache.org/r/14435/diff/1/?file=360216#file360216line1140>
> >
> >     Should we do this after reconcile() call below so that superflous resources are not offered by the allocator?

Good idea!


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14435/#review26585
-----------------------------------------------------------


On Oct. 2, 2013, 12:25 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14435/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2013, 12:25 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Thomas Marshall, and Vinod Kone.
> 
> 
> Bugs: MESOS-711
>     https://issues.apache.org/jira/browse/MESOS-711
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This adds the following two calls to the Allocator:
>   slaveDisconnected(SlaveID)
>   slaveReconnected(SlaveID)
> 
> Which allows the allocator to maintain Slave state when checkpointing slaves disconnect. This is in contrast to the existing code which completely removes disconnected checkpointing slaves from the allocator and re-adds them upon reconnection. However, this is complex to get correct, and ultimately had bugs: MESOS-711.
> 
> I've included a test in the subsequent change in this chain.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator.hpp 46a7370d589d939ca64d645879a09221b8491569 
>   src/master/hierarchical_allocator_process.hpp 183b205663e82aa53bb3898cac0b1be10b9a4f3f 
>   src/master/master.cpp a49b17ef43fca5b385a89731ca8776a26b61399a 
>   src/tests/mesos.hpp 8fbd56c8dd438a08673b54630bfe25d60ad5ee0e 
> 
> Diff: https://reviews.apache.org/r/14435/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added a test in the subsequent review.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request 14435: Fixed a bug with resource accounting in reconciliation.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14435/#review26585
-----------------------------------------------------------

Ship it!



src/master/allocator.hpp
<https://reviews.apache.org/r/14435/#comment51826>

    s/checkpointing//
    
    I think the allocator doesn't need to know about the slaves checkpointing.



src/master/allocator.hpp
<https://reviews.apache.org/r/14435/#comment51827>

    s/checkpointing//



src/master/master.cpp
<https://reviews.apache.org/r/14435/#comment51828>

    Should we do this after reconcile() call below so that superflous resources are not offered by the allocator?


- Vinod Kone


On Oct. 2, 2013, 12:25 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14435/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2013, 12:25 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Thomas Marshall, and Vinod Kone.
> 
> 
> Bugs: MESOS-711
>     https://issues.apache.org/jira/browse/MESOS-711
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This adds the following two calls to the Allocator:
>   slaveDisconnected(SlaveID)
>   slaveReconnected(SlaveID)
> 
> Which allows the allocator to maintain Slave state when checkpointing slaves disconnect. This is in contrast to the existing code which completely removes disconnected checkpointing slaves from the allocator and re-adds them upon reconnection. However, this is complex to get correct, and ultimately had bugs: MESOS-711.
> 
> I've included a test in the subsequent change in this chain.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator.hpp 46a7370d589d939ca64d645879a09221b8491569 
>   src/master/hierarchical_allocator_process.hpp 183b205663e82aa53bb3898cac0b1be10b9a4f3f 
>   src/master/master.cpp a49b17ef43fca5b385a89731ca8776a26b61399a 
>   src/tests/mesos.hpp 8fbd56c8dd438a08673b54630bfe25d60ad5ee0e 
> 
> Diff: https://reviews.apache.org/r/14435/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added a test in the subsequent review.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>