You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Jiang Yan Xu <ya...@jxu.me> on 2013/12/16 11:08:41 UTC

Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Bugs: MESOS-883
    https://issues.apache.org/jira/browse/MESOS-883


Repository: mesos-git


Description
-------

See summary.


Diffs
-----

  src/master/master.cpp dd6111946289b82008a66e46df8ef5e538de70ea 
  src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
  src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
  src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
  src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
  src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
  src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
  src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
  src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 

Diff: https://reviews.apache.org/r/16290/diff/


Testing
-------

make check.


Thanks,

Jiang Yan Xu


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/#review30709
-----------------------------------------------------------

Ship it!


Ship It!

- Ben Mahler


On Dec. 19, 2013, 6:55 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16290/
> -----------------------------------------------------------
> 
> (Updated Dec. 19, 2013, 6:55 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-883
>     https://issues.apache.org/jira/browse/MESOS-883
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> With this patch the group only returns failed futures when the error is non-retryable and client (master, slave, sched) should always terminate.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 6c168a2cdbd8343516cb47adceaff70c3d46690b 
>   src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
>   src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
>   src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
>   src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
>   src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
>   src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
>   src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
>   src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 
> 
> Diff: https://reviews.apache.org/r/16290/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/#review30727
-----------------------------------------------------------

Ship it!


Ship It!

- Ben Mahler


On Dec. 19, 2013, 10:18 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16290/
> -----------------------------------------------------------
> 
> (Updated Dec. 19, 2013, 10:18 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-883
>     https://issues.apache.org/jira/browse/MESOS-883
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> With this patch the group only returns failed futures when the error is non-retryable and client (master, slave, sched) should always terminate.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 6c168a2cdbd8343516cb47adceaff70c3d46690b 
>   src/master/master.cpp 30cea5431a8eb81cb3751a66889a3c3d27b425a0 
>   src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
>   src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
>   src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
>   src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
>   src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
>   src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
>   src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
>   src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 
> 
> Diff: https://reviews.apache.org/r/16290/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/
-----------------------------------------------------------

(Updated Dec. 19, 2013, 10:18 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

Resolved the post review issue so this patch now includes src/master/master.cpp (with changes based on BenM's comment on https://reviews.apache.org/r/16393/)


Bugs: MESOS-883
    https://issues.apache.org/jira/browse/MESOS-883


Repository: mesos-git


Description
-------

With this patch the group only returns failed futures when the error is non-retryable and client (master, slave, sched) should always terminate.


Diffs (updated)
-----

  src/master/master.hpp 6c168a2cdbd8343516cb47adceaff70c3d46690b 
  src/master/master.cpp 30cea5431a8eb81cb3751a66889a3c3d27b425a0 
  src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
  src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
  src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
  src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
  src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
  src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
  src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
  src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 

Diff: https://reviews.apache.org/r/16290/diff/


Testing
-------

make check.


Thanks,

Jiang Yan Xu


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/
-----------------------------------------------------------

(Updated Dec. 19, 2013, 7:49 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Bugs: MESOS-883
    https://issues.apache.org/jira/browse/MESOS-883


Repository: mesos-git


Description
-------

With this patch the group only returns failed futures when the error is non-retryable and client (master, slave, sched) should always terminate.


Diffs
-----

  src/master/master.hpp 6c168a2cdbd8343516cb47adceaff70c3d46690b 
  src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
  src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
  src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
  src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
  src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
  src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
  src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
  src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 

Diff: https://reviews.apache.org/r/16290/diff/


Testing
-------

make check.


Thanks,

Jiang Yan Xu


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/
-----------------------------------------------------------

(Updated Dec. 19, 2013, 6:55 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

Addressed BenM's comments.

Due to error in RB's mesos repo: changes to `src/master/master.cpp` had to be excluded from this patch and submitted as a later view: https://reviews.apache.org/r/16393
The ticket for RB repo issue: https://issues.apache.org/jira/browse/INFRA-7115


Bugs: MESOS-883
    https://issues.apache.org/jira/browse/MESOS-883


Repository: mesos-git


Description
-------

With this patch the group only returns failed futures when the error is non-retryable and client (master, slave, sched) should always terminate.


Diffs (updated)
-----

  src/master/master.hpp 6c168a2cdbd8343516cb47adceaff70c3d46690b 
  src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
  src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
  src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
  src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
  src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
  src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
  src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
  src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 

Diff: https://reviews.apache.org/r/16290/diff/


Testing
-------

make check.


Thanks,

Jiang Yan Xu


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On Dec. 18, 2013, 7:50 p.m., Ben Mahler wrote:
> > src/zookeeper/contender.cpp, line 123
> > <https://reviews.apache.org/r/16290/diff/3/?file=399313#file399313line123>
> >
> >     Looks like this is a failure to contend rather than a consistency issue, so it seems Failure is more appropriate here.
> 
> Jiang Yan Xu wrote:
>     Here it's a CHECK because this usage violates the instructions in contender.hpp:
>     
>     "
>     // Note that the contender is NOT reusable, which means its methods
>     // are supposed to be called once and the client needs to create a
>     // new instance to contend again.
>     "
>     
>     It was suggested in Vinod's review. Sound alright?

Changing this to use Failure after talking to BenM offline. Will update coding conventions for this.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/#review30628
-----------------------------------------------------------


On Dec. 19, 2013, 6:55 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16290/
> -----------------------------------------------------------
> 
> (Updated Dec. 19, 2013, 6:55 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-883
>     https://issues.apache.org/jira/browse/MESOS-883
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> With this patch the group only returns failed futures when the error is non-retryable and client (master, slave, sched) should always terminate.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 6c168a2cdbd8343516cb47adceaff70c3d46690b 
>   src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
>   src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
>   src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
>   src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
>   src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
>   src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
>   src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
>   src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 
> 
> Diff: https://reviews.apache.org/r/16290/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On Dec. 18, 2013, 7:50 p.m., Ben Mahler wrote:
> > src/zookeeper/contender.cpp, line 148
> > <https://reviews.apache.org/r/16290/diff/3/?file=399313#file399313line148>
> >
> >     Is this also a Failure to withdraw?

This is "impossible not happen" because Group doesn't discard futures unless it's destructing - which shouldn't happen either here because Group outlives the contender.


> On Dec. 18, 2013, 7:50 p.m., Ben Mahler wrote:
> > src/zookeeper/contender.cpp, line 123
> > <https://reviews.apache.org/r/16290/diff/3/?file=399313#file399313line123>
> >
> >     Looks like this is a failure to contend rather than a consistency issue, so it seems Failure is more appropriate here.

Here it's a CHECK because this usage violates the instructions in contender.hpp:

"
// Note that the contender is NOT reusable, which means its methods
// are supposed to be called once and the client needs to create a
// new instance to contend again.
"

It was suggested in Vinod's review. Sound alright?


> On Dec. 18, 2013, 7:50 p.m., Ben Mahler wrote:
> > src/zookeeper/contender.cpp, lines 138-139
> > <https://reviews.apache.org/r/16290/diff/3/?file=399313#file399313line138>
> >
> >     I'm not sure this should be a CHECK failure since it can be handled gracefully. Can this return a failure instead?

Same as above.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/#review30628
-----------------------------------------------------------


On Dec. 18, 2013, 6 a.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16290/
> -----------------------------------------------------------
> 
> (Updated Dec. 18, 2013, 6 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-883
>     https://issues.apache.org/jira/browse/MESOS-883
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> With this patch the group only returns failed futures when the error is non-retryable and client (master, slave, sched) should always terminate.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 6c168a2cdbd8343516cb47adceaff70c3d46690b 
>   src/master/master.cpp dd6111946289b82008a66e46df8ef5e538de70ea 
>   src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
>   src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
>   src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
>   src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
>   src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
>   src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
>   src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
>   src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 
> 
> Diff: https://reviews.apache.org/r/16290/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/#review30628
-----------------------------------------------------------


Looking much better, thanks Yan!


src/master/master.cpp
<https://reviews.apache.org/r/16290/#comment58673>

    Just a thought for later: was there a need for initialize or could the contend() call have taken the UPID as an argument?



src/master/master.cpp
<https://reviews.apache.org/r/16290/#comment58678>

    No need for an else block now that we always EXIT.



src/sched/sched.cpp
<https://reviews.apache.org/r/16290/#comment58683>

    I see you updated this code in r/16291 but I think it belongs in this change since you've made the same update to the Master's detection code.



src/slave/slave.cpp
<https://reviews.apache.org/r/16290/#comment58689>

    Ditto on updating failure handling in this change instead of r/16291.



src/tests/master_contender_detector_tests.cpp
<https://reviews.apache.org/r/16290/#comment58691>

    Can we keep this test but ensure that it does not fail?



src/tests/zookeeper_tests.cpp
<https://reviews.apache.org/r/16290/#comment58693>

    Ditto, is it possible to keep this and ensure we handle this without failing?



src/zookeeper/contender.cpp
<https://reviews.apache.org/r/16290/#comment58708>

    Looks like this is a failure to contend rather than a consistency issue, so it seems Failure is more appropriate here.



src/zookeeper/contender.cpp
<https://reviews.apache.org/r/16290/#comment58699>

    I'm not sure this should be a CHECK failure since it can be handled gracefully. Can this return a failure instead?



src/zookeeper/contender.cpp
<https://reviews.apache.org/r/16290/#comment58700>

    Is this also a Failure to withdraw?



src/zookeeper/contender.cpp
<https://reviews.apache.org/r/16290/#comment58717>

    CHECK_SOME



src/zookeeper/group.hpp
<https://reviews.apache.org/r/16290/#comment58719>

    remove const&



src/zookeeper/group.cpp
<https://reviews.apache.org/r/16290/#comment58720>

    You could rename this to message, and pass 'message' into the second argument for the calls to fail(), might be a bit cleaner.


- Ben Mahler


On Dec. 18, 2013, 6 a.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16290/
> -----------------------------------------------------------
> 
> (Updated Dec. 18, 2013, 6 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-883
>     https://issues.apache.org/jira/browse/MESOS-883
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> With this patch the group only returns failed futures when the error is non-retryable and client (master, slave, sched) should always terminate.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 6c168a2cdbd8343516cb47adceaff70c3d46690b 
>   src/master/master.cpp dd6111946289b82008a66e46df8ef5e538de70ea 
>   src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
>   src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
>   src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
>   src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
>   src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
>   src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
>   src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
>   src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 
> 
> Diff: https://reviews.apache.org/r/16290/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/
-----------------------------------------------------------

(Updated Dec. 18, 2013, 6 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

Discard futures when contender destructs.


Bugs: MESOS-883
    https://issues.apache.org/jira/browse/MESOS-883


Repository: mesos-git


Description
-------

With this patch the group only returns failed futures when the error is non-retryable and client (master, slave, sched) should always terminate.


Diffs (updated)
-----

  src/master/master.hpp 6c168a2cdbd8343516cb47adceaff70c3d46690b 
  src/master/master.cpp dd6111946289b82008a66e46df8ef5e538de70ea 
  src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
  src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
  src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
  src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
  src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
  src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
  src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
  src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 

Diff: https://reviews.apache.org/r/16290/diff/


Testing
-------

make check.


Thanks,

Jiang Yan Xu


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/
-----------------------------------------------------------

(Updated Dec. 18, 2013, 12:32 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

BenM's comments.


Bugs: MESOS-883
    https://issues.apache.org/jira/browse/MESOS-883


Repository: mesos-git


Description (updated)
-------

With this patch the group only returns failed futures when the error is non-retryable and client (master, slave, sched) should always terminate.


Diffs (updated)
-----

  src/master/master.hpp 6c168a2cdbd8343516cb47adceaff70c3d46690b 
  src/master/master.cpp dd6111946289b82008a66e46df8ef5e538de70ea 
  src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
  src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
  src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
  src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
  src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
  src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
  src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
  src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 

Diff: https://reviews.apache.org/r/16290/diff/


Testing
-------

make check.


Thanks,

Jiang Yan Xu


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Ben Mahler <be...@gmail.com>.

> On Dec. 16, 2013, 7:41 p.m., Ben Mahler wrote:
> > src/master/master.cpp, line 689
> > <https://reviews.apache.org/r/16290/diff/1/?file=398158#file398158line689>
> >
> >     This seems like an unexpected failure than an expected event so I would use LOG(FATAL) here. Is my understanding correct here?

The con of using LOG(FATAL) in this case is that we do not need the stack trace, and I suspect misconfiguration could actually cause these failures, in which case, EXIT(1) seems more appropriate. So I think I take back my comment here, thoughts?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/#review30463
-----------------------------------------------------------


On Dec. 16, 2013, 10:08 a.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16290/
> -----------------------------------------------------------
> 
> (Updated Dec. 16, 2013, 10:08 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-883
>     https://issues.apache.org/jira/browse/MESOS-883
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp dd6111946289b82008a66e46df8ef5e538de70ea 
>   src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
>   src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
>   src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
>   src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
>   src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
>   src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
>   src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
>   src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 
> 
> Diff: https://reviews.apache.org/r/16290/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On Dec. 16, 2013, 7:41 p.m., Ben Mahler wrote:
> > src/master/master.cpp, line 689
> > <https://reviews.apache.org/r/16290/diff/1/?file=398158#file398158line689>
> >
> >     This seems like an unexpected failure than an expected event so I would use LOG(FATAL) here. Is my understanding correct here?
> 
> Ben Mahler wrote:
>     The con of using LOG(FATAL) in this case is that we do not need the stack trace, and I suspect misconfiguration could actually cause these failures, in which case, EXIT(1) seems more appropriate. So I think I take back my comment here, thoughts?
> 
> Jiang Yan Xu wrote:
>     I lean towards using EXIT(1).

They are arguments both ways but I guess "expected" is more subjective while "need stack trace" is easier to determine so maybe we'll use it to judge between the two. We can put it in our coding conventions.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/#review30463
-----------------------------------------------------------


On Dec. 16, 2013, 10:08 a.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16290/
> -----------------------------------------------------------
> 
> (Updated Dec. 16, 2013, 10:08 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-883
>     https://issues.apache.org/jira/browse/MESOS-883
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp dd6111946289b82008a66e46df8ef5e538de70ea 
>   src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
>   src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
>   src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
>   src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
>   src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
>   src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
>   src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
>   src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 
> 
> Diff: https://reviews.apache.org/r/16290/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 16290: Group’s local timeout now only expires the ZK session and does not fail operations.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16290/#review30463
-----------------------------------------------------------


Partial review so far, mostly looked at the changes to the Master here.


src/master/master.cpp
<https://reviews.apache.org/r/16290/#comment58285>

    Why this CHECK is restricted to only the failure case? And is it still applicable? Is there no race here between the detector and contender notifications?



src/master/master.cpp
<https://reviews.apache.org/r/16290/#comment58281>

    This seems like an unexpected failure than an expected event so I would use LOG(FATAL) here. Is my understanding correct here?



src/master/master.cpp
<https://reviews.apache.org/r/16290/#comment58286>

    Move to the top, as in done in lostCandidacy().
    
    Can you make this more explicit, then there's no need for the << operator:
    
    CHECK(_!contended.isDiscarded());
    
    The "MasterContender" part is a bit misleading since we're attributing the discard to the MasterContender without knowing who has actually discarded it.



src/master/master.cpp
<https://reviews.apache.org/r/16290/#comment58289>

    Is there a reason we must detect after we've entered the contest?
    
    The result here is that there are now two cycles of detection going on at once, the detected() callback can initiate a call to detect() and the contended() callback and initiate a call to detect().
    
    What about making the contend/detect processes completely independent in the Master, this should be the easiest to think about and would ensure we avoid things like redundant detect() calls.



src/master/master.cpp
<https://reviews.apache.org/r/16290/#comment58287>

    Ditto on attributing the discard to MasterContender, and just killing the << operator.



src/master/master.cpp
<https://reviews.apache.org/r/16290/#comment58284>

    Should this failure also be fatal?



src/master/master.cpp
<https://reviews.apache.org/r/16290/#comment58290>

    Looks like this should only CHECK against discarded, and the failure case should be handled as FATAL.



src/zookeeper/group.hpp
<https://reviews.apache.org/r/16290/#comment58280>

    Can you add a note about what aborting means here? Effectively the group enters an error state and all subsequent operations will fail, right?



src/zookeeper/group.hpp
<https://reviews.apache.org/r/16290/#comment58279>

    Option<Error>



src/zookeeper/group.cpp
<https://reviews.apache.org/r/16290/#comment58278>

    "to cleaning up"?


- Ben Mahler


On Dec. 16, 2013, 10:08 a.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16290/
> -----------------------------------------------------------
> 
> (Updated Dec. 16, 2013, 10:08 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-883
>     https://issues.apache.org/jira/browse/MESOS-883
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp dd6111946289b82008a66e46df8ef5e538de70ea 
>   src/sched/sched.cpp c46535643ce2ea456bd939571fe78358c1d3871b 
>   src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5 
>   src/tests/master_contender_detector_tests.cpp 76464eab479461e6e3cb8b5afe85860e60428cf5 
>   src/tests/zookeeper_tests.cpp a0660cbfb1b7073654b05b82bffd69fbb04d0165 
>   src/zookeeper/contender.hpp d0b386eeec485f7c8e6b5e60936a84d2f0bba92d 
>   src/zookeeper/contender.cpp bb7e25555b2635398e9d56a808993bcd0b0e47f6 
>   src/zookeeper/group.hpp 088363a80108472ffeb14121fcb4da02d6e9acf4 
>   src/zookeeper/group.cpp f3ba86fec3584a83fd55d6388656ebde5249824a 
> 
> Diff: https://reviews.apache.org/r/16290/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>