You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Jiang Yan Xu <ya...@jxu.me> on 2014/01/28 02:59:29 UTC

Review Request 17440: Removed an unnecessary intermediate ZooKeeper event handler (WatcherProcess) which has a bug in ensuring the lifecycles of WatcherProcess and Watcher match each other thus causing MESOS-937.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17440/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Bugs: MESOS-937
    https://issues.apache.org/jira/browse/MESOS-937


Repository: mesos-git


Description
-------

Description in JIRA: "Between the execution of ProcessWatcher::~ProcessWatcher() and its base class destructor Watcher::~Watcher(), the pure virtual method Watcher::process() can be invoked by WatcherProcess::event()."

By eliminating WatcherProcess this problem is resolved.


Diffs
-----

  src/state/zookeeper.hpp d1d1fedf27987aeaf9fbdee678d3b3848d05620a 
  src/state/zookeeper.cpp 09b63d44e9349cab2d73659c939de3d8e96fbcc5 
  src/zookeeper/group.hpp e51ebb2cf5f09a633462c101f913ee8272be9a6c 
  src/zookeeper/group.cpp ecb6c002e8194b8d67e262826d988f747414f9f3 
  src/zookeeper/watcher.hpp 1db0386719c2a675d29b47b417dc856993062326 
  src/zookeeper/zookeeper.hpp f50aca6e7035c8084c3e76fd56b9d1ef7f9d9902 
  src/zookeeper/zookeeper.cpp 5720f4c1cd51c4d998b52e7bf7f9f019ae80c5f8 

Diff: https://reviews.apache.org/r/17440/diff/


Testing
-------

make check.
Jenkins tests ongoing, will report the results.


Thanks,

Jiang Yan Xu


Re: Review Request 17440: Removed an unnecessary intermediate ZooKeeper event handler (WatcherProcess) which has a bug in ensuring the lifecycles of WatcherProcess and Watcher match each other thus causing MESOS-937.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On Jan. 27, 2014, 9:38 p.m., Benjamin Hindman wrote:
> > src/state/zookeeper.cpp, line 73
> > <https://reviews.apache.org/r/17440/diff/1/?file=452615#file452615line73>
> >
> >     But we aren't connecting! The 'zk' handle is set to NULL on the line above. What was the problem with DISCONNECTED? That was used originally in GroupProcess as well.

My thinking is that DISCONNECTED is a state that we set but never check against. We didn't event check against CONNECTING. So we basically only needed one state other than CONNECTED in the old code (which is renamed to READY). 


> On Jan. 27, 2014, 9:38 p.m., Benjamin Hindman wrote:
> > src/zookeeper/watcher.hpp, line 33
> > <https://reviews.apache.org/r/17440/diff/1/?file=452618#file452618line33>
> >
> >     Please document why having shared state here doesn't work due to concurrency issues.


> On Jan. 27, 2014, 9:38 p.m., Benjamin Hindman wrote:
> > src/state/zookeeper.cpp, lines 195-202
> > <https://reviews.apache.org/r/17440/diff/1/?file=452615#file452615line195>
> >
> >     This couple of lines of code are hard to follow. Basically, I can get a 'connected' callback and be in the state CONNECTING, which makes me CONNECTED, but also be in lots of other states and do nothing? Can I be in READY and then get a 'connected' callback? Why don't I transition to CONNECTED if I'm in READY? We need to be much more explicit about what states we can be in here please.

The state diagram: https://www.dropbox.com/s/8ogaehmrlhziih8/ZooKeeper%20client%20states.jpg
I should document this better but in essence the state only transitions forward unless the session expires, in which case it goes back to CONNECTING. When reconnection happens the state is remembered so that reconnection is recognized by the object.

I thought this works better than a single 'reconnect' variable because what 'reconnect' captures is really "what has already been done for this session so that we don't need to do it again unless it expires" and using states we can capture it better because there can be multiple operations/stages for this particular class that set the ZK "environment" up after it is connected to ZK. In the case of Group it is CONNECTED -> authentication -> AUTHENTICATED -> create base directory -> READY and for ZooKeeperState it is just CONNECTED -> authenticate -> READY.

I will make what's described here more explicit.


> On Jan. 27, 2014, 9:38 p.m., Benjamin Hindman wrote:
> > src/state/zookeeper.cpp, line 217
> > <https://reviews.apache.org/r/17440/diff/1/?file=452615#file452615line217>
> >
> >     We should really capture the bad state due to failed authentication rather than keep us in the CONNECTED state and also add a new READY state.

If authentication fails, 'error' is set and it indicates the 'bad state': this object is no longer 'functional'. Is this not enough?


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17440/#review32956
-----------------------------------------------------------


On Jan. 27, 2014, 5:59 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17440/
> -----------------------------------------------------------
> 
> (Updated Jan. 27, 2014, 5:59 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-937
>     https://issues.apache.org/jira/browse/MESOS-937
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Description in JIRA: "Between the execution of ProcessWatcher::~ProcessWatcher() and its base class destructor Watcher::~Watcher(), the pure virtual method Watcher::process() can be invoked by WatcherProcess::event()."
> 
> By eliminating WatcherProcess this problem is resolved.
> 
> 
> Diffs
> -----
> 
>   src/state/zookeeper.hpp d1d1fedf27987aeaf9fbdee678d3b3848d05620a 
>   src/state/zookeeper.cpp 09b63d44e9349cab2d73659c939de3d8e96fbcc5 
>   src/zookeeper/group.hpp e51ebb2cf5f09a633462c101f913ee8272be9a6c 
>   src/zookeeper/group.cpp ecb6c002e8194b8d67e262826d988f747414f9f3 
>   src/zookeeper/watcher.hpp 1db0386719c2a675d29b47b417dc856993062326 
>   src/zookeeper/zookeeper.hpp f50aca6e7035c8084c3e76fd56b9d1ef7f9d9902 
>   src/zookeeper/zookeeper.cpp 5720f4c1cd51c4d998b52e7bf7f9f019ae80c5f8 
> 
> Diff: https://reviews.apache.org/r/17440/diff/
> 
> 
> Testing
> -------
> 
> make check.
> Jenkins tests ongoing, will report the results.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 17440: Removed an unnecessary intermediate ZooKeeper event handler (WatcherProcess) which has a bug in ensuring the lifecycles of WatcherProcess and Watcher match each other thus causing MESOS-937.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On Jan. 27, 2014, 9:38 p.m., Benjamin Hindman wrote:
> > src/state/zookeeper.cpp, line 73
> > <https://reviews.apache.org/r/17440/diff/1/?file=452615#file452615line73>
> >
> >     But we aren't connecting! The 'zk' handle is set to NULL on the line above. What was the problem with DISCONNECTED? That was used originally in GroupProcess as well.
> 
> Jiang Yan Xu wrote:
>     My thinking is that DISCONNECTED is a state that we set but never check against. We didn't event check against CONNECTING. So we basically only needed one state other than CONNECTED in the old code (which is renamed to READY).

I reversed the changes to state/zookeeper and instead now use a 'reconnect' boolean maintained by state/zookeeper to replicate the original logic in watcher.hpp.
I think keeping track of state changes through the 'state' variable is useful for Group though.
Right now state/zookeeper doesn't have the sophistication of retries and local session timeouts that Group has so I guess it's not necessary to copy Group's state machine.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17440/#review32956
-----------------------------------------------------------


On Jan. 30, 2014, 12:42 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17440/
> -----------------------------------------------------------
> 
> (Updated Jan. 30, 2014, 12:42 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-937
>     https://issues.apache.org/jira/browse/MESOS-937
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Description in JIRA: "Between the execution of ProcessWatcher::~ProcessWatcher() and its base class destructor Watcher::~Watcher(), the pure virtual method Watcher::process() can be invoked by WatcherProcess::event()."
> 
> By eliminating WatcherProcess this problem is resolved.
> 
> 
> Diffs
> -----
> 
>   src/state/zookeeper.hpp d1d1fedf27987aeaf9fbdee678d3b3848d05620a 
>   src/state/zookeeper.cpp 09b63d44e9349cab2d73659c939de3d8e96fbcc5 
>   src/zookeeper/group.hpp e51ebb2cf5f09a633462c101f913ee8272be9a6c 
>   src/zookeeper/group.cpp ecb6c002e8194b8d67e262826d988f747414f9f3 
>   src/zookeeper/watcher.hpp 1db0386719c2a675d29b47b417dc856993062326 
>   src/zookeeper/zookeeper.hpp f50aca6e7035c8084c3e76fd56b9d1ef7f9d9902 
>   src/zookeeper/zookeeper.cpp 5720f4c1cd51c4d998b52e7bf7f9f019ae80c5f8 
> 
> Diff: https://reviews.apache.org/r/17440/diff/
> 
> 
> Testing
> -------
> 
> make check locally and on Jenkins.
> 
> 
> File Attachments
> ----------------
> 
> ZK Client States
>   https://reviews.apache.org/media/uploaded/files/2014/01/28/59be560e-3e4a-407a-acc9-baf07183fe32__ZooKeeper_client_states.jpg
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 17440: Removed an unnecessary intermediate ZooKeeper event handler (WatcherProcess) which has a bug in ensuring the lifecycles of WatcherProcess and Watcher match each other thus causing MESOS-937.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17440/#review32956
-----------------------------------------------------------



src/state/zookeeper.cpp
<https://reviews.apache.org/r/17440/#comment62023>

    But we aren't connecting! The 'zk' handle is set to NULL on the line above. What was the problem with DISCONNECTED? That was used originally in GroupProcess as well.



src/state/zookeeper.cpp
<https://reviews.apache.org/r/17440/#comment62024>

    This couple of lines of code are hard to follow. Basically, I can get a 'connected' callback and be in the state CONNECTING, which makes me CONNECTED, but also be in lots of other states and do nothing? Can I be in READY and then get a 'connected' callback? Why don't I transition to CONNECTED if I'm in READY? We need to be much more explicit about what states we can be in here please.



src/state/zookeeper.cpp
<https://reviews.apache.org/r/17440/#comment62021>

    We should really capture the bad state due to failed authentication rather than keep us in the CONNECTED state and also add a new READY state.



src/zookeeper/watcher.hpp
<https://reviews.apache.org/r/17440/#comment62025>

    Please document why having shared state here doesn't work due to concurrency issues.


- Benjamin Hindman


On Jan. 28, 2014, 1:59 a.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17440/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2014, 1:59 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-937
>     https://issues.apache.org/jira/browse/MESOS-937
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Description in JIRA: "Between the execution of ProcessWatcher::~ProcessWatcher() and its base class destructor Watcher::~Watcher(), the pure virtual method Watcher::process() can be invoked by WatcherProcess::event()."
> 
> By eliminating WatcherProcess this problem is resolved.
> 
> 
> Diffs
> -----
> 
>   src/state/zookeeper.hpp d1d1fedf27987aeaf9fbdee678d3b3848d05620a 
>   src/state/zookeeper.cpp 09b63d44e9349cab2d73659c939de3d8e96fbcc5 
>   src/zookeeper/group.hpp e51ebb2cf5f09a633462c101f913ee8272be9a6c 
>   src/zookeeper/group.cpp ecb6c002e8194b8d67e262826d988f747414f9f3 
>   src/zookeeper/watcher.hpp 1db0386719c2a675d29b47b417dc856993062326 
>   src/zookeeper/zookeeper.hpp f50aca6e7035c8084c3e76fd56b9d1ef7f9d9902 
>   src/zookeeper/zookeeper.cpp 5720f4c1cd51c4d998b52e7bf7f9f019ae80c5f8 
> 
> Diff: https://reviews.apache.org/r/17440/diff/
> 
> 
> Testing
> -------
> 
> make check.
> Jenkins tests ongoing, will report the results.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 17440: Removed an unnecessary intermediate ZooKeeper event handler (WatcherProcess) which has a bug in ensuring the lifecycles of WatcherProcess and Watcher match each other thus causing MESOS-937.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17440/#review34127
-----------------------------------------------------------

Ship it!


Ship It!

- Vinod Kone


On Feb. 7, 2014, 9:50 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17440/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2014, 9:50 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-937
>     https://issues.apache.org/jira/browse/MESOS-937
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Description in JIRA: "Between the execution of ProcessWatcher::~ProcessWatcher() and its base class destructor Watcher::~Watcher(), the pure virtual method Watcher::process() can be invoked by WatcherProcess::event()."
> 
> By eliminating WatcherProcess this problem is resolved.
> 
> 
> Diffs
> -----
> 
>   src/zookeeper/zookeeper.hpp f50aca6e7035c8084c3e76fd56b9d1ef7f9d9902 
>   src/zookeeper/zookeeper.cpp 5720f4c1cd51c4d998b52e7bf7f9f019ae80c5f8 
> 
> Diff: https://reviews.apache.org/r/17440/diff/
> 
> 
> Testing
> -------
> 
> make check locally and on Jenkins.
> 
> 
> File Attachments
> ----------------
> 
> ZK Client States
>   https://reviews.apache.org/media/uploaded/files/2014/01/28/59be560e-3e4a-407a-acc9-baf07183fe32__ZooKeeper_client_states.jpg
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 17440: Removed an unnecessary intermediate ZooKeeper event handler (WatcherProcess) which has a bug in ensuring the lifecycles of WatcherProcess and Watcher match each other thus causing MESOS-937.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17440/
-----------------------------------------------------------

(Updated Feb. 7, 2014, 1:50 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

- Moved Group change to another review: https://reviews.apache.org/r/17858
- Added comments that watchers are called from a single thread.
- Turned out watcher and other classes don't need to change if the watcher is called from a single thread.


Bugs: MESOS-937
    https://issues.apache.org/jira/browse/MESOS-937


Repository: mesos-git


Description
-------

Description in JIRA: "Between the execution of ProcessWatcher::~ProcessWatcher() and its base class destructor Watcher::~Watcher(), the pure virtual method Watcher::process() can be invoked by WatcherProcess::event()."

By eliminating WatcherProcess this problem is resolved.


Diffs (updated)
-----

  src/zookeeper/zookeeper.hpp f50aca6e7035c8084c3e76fd56b9d1ef7f9d9902 
  src/zookeeper/zookeeper.cpp 5720f4c1cd51c4d998b52e7bf7f9f019ae80c5f8 

Diff: https://reviews.apache.org/r/17440/diff/


Testing
-------

make check locally and on Jenkins.


File Attachments
----------------

ZK Client States
  https://reviews.apache.org/media/uploaded/files/2014/01/28/59be560e-3e4a-407a-acc9-baf07183fe32__ZooKeeper_client_states.jpg


Thanks,

Jiang Yan Xu


Re: Review Request 17440: Removed an unnecessary intermediate ZooKeeper event handler (WatcherProcess) which has a bug in ensuring the lifecycles of WatcherProcess and Watcher match each other thus causing MESOS-937.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17440/
-----------------------------------------------------------

(Updated Jan. 30, 2014, 12:42 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

- According to BenH's comments, reversed changes to state/zookeeper that makes its connection state handling similar to Group's.
- Group now does not retry authenticate() or create() after reconnection within the same session if they previously succeeded.


Bugs: MESOS-937
    https://issues.apache.org/jira/browse/MESOS-937


Repository: mesos-git


Description
-------

Description in JIRA: "Between the execution of ProcessWatcher::~ProcessWatcher() and its base class destructor Watcher::~Watcher(), the pure virtual method Watcher::process() can be invoked by WatcherProcess::event()."

By eliminating WatcherProcess this problem is resolved.


Diffs (updated)
-----

  src/state/zookeeper.hpp d1d1fedf27987aeaf9fbdee678d3b3848d05620a 
  src/state/zookeeper.cpp 09b63d44e9349cab2d73659c939de3d8e96fbcc5 
  src/zookeeper/group.hpp e51ebb2cf5f09a633462c101f913ee8272be9a6c 
  src/zookeeper/group.cpp ecb6c002e8194b8d67e262826d988f747414f9f3 
  src/zookeeper/watcher.hpp 1db0386719c2a675d29b47b417dc856993062326 
  src/zookeeper/zookeeper.hpp f50aca6e7035c8084c3e76fd56b9d1ef7f9d9902 
  src/zookeeper/zookeeper.cpp 5720f4c1cd51c4d998b52e7bf7f9f019ae80c5f8 

Diff: https://reviews.apache.org/r/17440/diff/


Testing (updated)
-------

make check locally and on Jenkins.


File Attachments
----------------

ZK Client States
  https://reviews.apache.org/media/uploaded/files/2014/01/28/59be560e-3e4a-407a-acc9-baf07183fe32__ZooKeeper_client_states.jpg


Thanks,

Jiang Yan Xu


Re: Review Request 17440: Removed an unnecessary intermediate ZooKeeper event handler (WatcherProcess) which has a bug in ensuring the lifecycles of WatcherProcess and Watcher match each other thus causing MESOS-937.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17440/
-----------------------------------------------------------

(Updated Jan. 28, 2014, 11:05 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
-------

Attached the state diagram


Bugs: MESOS-937
    https://issues.apache.org/jira/browse/MESOS-937


Repository: mesos-git


Description
-------

Description in JIRA: "Between the execution of ProcessWatcher::~ProcessWatcher() and its base class destructor Watcher::~Watcher(), the pure virtual method Watcher::process() can be invoked by WatcherProcess::event()."

By eliminating WatcherProcess this problem is resolved.


Diffs
-----

  src/state/zookeeper.hpp d1d1fedf27987aeaf9fbdee678d3b3848d05620a 
  src/state/zookeeper.cpp 09b63d44e9349cab2d73659c939de3d8e96fbcc5 
  src/zookeeper/group.hpp e51ebb2cf5f09a633462c101f913ee8272be9a6c 
  src/zookeeper/group.cpp ecb6c002e8194b8d67e262826d988f747414f9f3 
  src/zookeeper/watcher.hpp 1db0386719c2a675d29b47b417dc856993062326 
  src/zookeeper/zookeeper.hpp f50aca6e7035c8084c3e76fd56b9d1ef7f9d9902 
  src/zookeeper/zookeeper.cpp 5720f4c1cd51c4d998b52e7bf7f9f019ae80c5f8 

Diff: https://reviews.apache.org/r/17440/diff/


Testing
-------

make check.
Jenkins tests ongoing, will report the results.


File Attachments (updated)
----------------

ZK Client States
  https://reviews.apache.org/media/uploaded/files/2014/01/28/59be560e-3e4a-407a-acc9-baf07183fe32__ZooKeeper_client_states.jpg


Thanks,

Jiang Yan Xu