You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Udo Kohlmeyer <uk...@gmail.com> on 2016/02/10 02:42:07 UTC

Re: Review Request 43400: GEODE-870: Handling multiple concurrent locator restarts, avoiding "split-brain" scenario where locators both think they are coordinators

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43400/
-----------------------------------------------------------

(Updated Feb. 10, 2016, 1:42 a.m.)


Review request for geode, Bruce Schuchardt and Hitesh Khamesra.


Summary (updated)
-----------------

GEODE-870: Handling multiple concurrent locator restarts, avoiding "split-brain" scenario where locators both think they are coordinators


Bugs: GEODE-870
    https://issues.apache.org/jira/browse/GEODE-870


Repository: geode


Description
-------

GEODE-870: Handling multiple concurrent locator restarts. Locators now join as members before being promoted to coordinator.


Diffs
-----

  gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/NetView.java b0ddcc09190f3a91fcea860f28efe6f2c52ea3ad 
  gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java 0f16ba97df7396db174dde560cb529709aeb8e43 
  gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/InstallViewMessage.java 91f691898fe3ec21013453787b27444dd978c7dd 
  gemfire-core/src/main/java/com/gemstone/gemfire/internal/DSFIDFactory.java b77dfdb4e2554193ed7cfa86470767ee94e32e47 
  gemfire-core/src/main/java/com/gemstone/gemfire/internal/DataSerializableFixedID.java 7b263bfb6342c09b3075d7562268cf0687c80710 
  gemfire-core/src/test/java/com/gemstone/gemfire/distributed/LocatorDUnitTest.java 545a0eab1ef189b384fba41b4d97abb3fe60e78a 
  gemfire-core/src/test/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeaveHelper.java PRE-CREATION 
  gemfire-core/src/test/java/com/gemstone/gemfire/test/dunit/SerializableRunnable.java 353cdc750fdc299d592e4248432945e14d6f77d0 
  gradle/rat.gradle f7826657e4133a06963149756f8b9a2a5e3d477e 

Diff: https://reviews.apache.org/r/43400/diff/


Testing
-------

LocatorDUnitTest.testMultipleLocatorsRestartingAtSameTime - This replicates the problem and passes.


Thanks,

Udo Kohlmeyer


Re: Review Request 43400: GEODE-870: Handling multiple concurrent locator restarts, avoiding "split-brain" scenario where locators both think they are coordinators

Posted by Udo Kohlmeyer <uk...@gmail.com>.

> On Feb. 11, 2016, 5:42 p.m., Bruce Schuchardt wrote:
> > gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/NetView.java, line 50
> > <https://reviews.apache.org/r/43400/diff/1/?file=1239186#file1239186line50>
> >
> >     why create a logger for NetView?

Correct, was used by me for debugging purposes


> On Feb. 11, 2016, 5:42 p.m., Bruce Schuchardt wrote:
> > gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java, line 1723
> > <https://reviews.apache.org/r/43400/diff/1/?file=1239187#file1239187line1723>
> >
> >     this change should be reverted.  currentView could be modified by another thread so it's appropriate to cache it in a local variable.

Reverted code.


> On Feb. 11, 2016, 5:42 p.m., Bruce Schuchardt wrote:
> > gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java, line 756
> > <https://reviews.apache.org/r/43400/diff/1/?file=1239187#file1239187line756>
> >
> >     stopCoordinatorServices will interrupt the current thread.
> >     
> >     I was thinking you could just add a check to see if ViewCreator.run() should exit after sending out the view.

Agreed... and I can see that this would be the wrong place to have this check.


- Udo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43400/#review118880
-----------------------------------------------------------


On Feb. 10, 2016, 1:42 a.m., Udo Kohlmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43400/
> -----------------------------------------------------------
> 
> (Updated Feb. 10, 2016, 1:42 a.m.)
> 
> 
> Review request for geode, Bruce Schuchardt and Hitesh Khamesra.
> 
> 
> Bugs: GEODE-870
>     https://issues.apache.org/jira/browse/GEODE-870
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> GEODE-870: Handling multiple concurrent locator restarts. Locators now join as members before being promoted to coordinator.
> 
> 
> Diffs
> -----
> 
>   gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/NetView.java b0ddcc09190f3a91fcea860f28efe6f2c52ea3ad 
>   gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java 0f16ba97df7396db174dde560cb529709aeb8e43 
>   gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/InstallViewMessage.java 91f691898fe3ec21013453787b27444dd978c7dd 
>   gemfire-core/src/main/java/com/gemstone/gemfire/internal/DSFIDFactory.java b77dfdb4e2554193ed7cfa86470767ee94e32e47 
>   gemfire-core/src/main/java/com/gemstone/gemfire/internal/DataSerializableFixedID.java 7b263bfb6342c09b3075d7562268cf0687c80710 
>   gemfire-core/src/test/java/com/gemstone/gemfire/distributed/LocatorDUnitTest.java 545a0eab1ef189b384fba41b4d97abb3fe60e78a 
>   gemfire-core/src/test/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeaveHelper.java PRE-CREATION 
>   gemfire-core/src/test/java/com/gemstone/gemfire/test/dunit/SerializableRunnable.java 353cdc750fdc299d592e4248432945e14d6f77d0 
>   gradle/rat.gradle f7826657e4133a06963149756f8b9a2a5e3d477e 
> 
> Diff: https://reviews.apache.org/r/43400/diff/
> 
> 
> Testing
> -------
> 
> LocatorDUnitTest.testMultipleLocatorsRestartingAtSameTime - This replicates the problem and passes.
> 
> 
> Thanks,
> 
> Udo Kohlmeyer
> 
>


Re: Review Request 43400: GEODE-870: Handling multiple concurrent locator restarts, avoiding "split-brain" scenario where locators both think they are coordinators

Posted by Bruce Schuchardt <bs...@pivotal.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43400/#review118880
-----------------------------------------------------------




gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/NetView.java (line 50)
<https://reviews.apache.org/r/43400/#comment180197>

    why create a logger for NetView?



gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java (line 695)
<https://reviews.apache.org/r/43400/#comment180198>

    stopCoordinatorServices will interrupt the current thread.
    
    I was thinking you could just add a check to see if ViewCreator.run() should exit after sending out the view.



gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java (line 918)
<https://reviews.apache.org/r/43400/#comment180200>

    I don't see how making these variable names so long is helping to make the code better.



gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java (line 1660)
<https://reviews.apache.org/r/43400/#comment180201>

    this change should be reverted.  currentView could be modified by another thread so it's appropriate to cache it in a local variable.



gemfire-core/src/test/java/com/gemstone/gemfire/distributed/LocatorDUnitTest.java (line 1401)
<https://reviews.apache.org/r/43400/#comment180204>

    nice test



gemfire-core/src/test/java/com/gemstone/gemfire/distributed/LocatorDUnitTest.java (line 1423)
<https://reviews.apache.org/r/43400/#comment180203>

    remove fine-level logging before checking in


- Bruce Schuchardt


On Feb. 10, 2016, 1:42 a.m., Udo Kohlmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43400/
> -----------------------------------------------------------
> 
> (Updated Feb. 10, 2016, 1:42 a.m.)
> 
> 
> Review request for geode, Bruce Schuchardt and Hitesh Khamesra.
> 
> 
> Bugs: GEODE-870
>     https://issues.apache.org/jira/browse/GEODE-870
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> GEODE-870: Handling multiple concurrent locator restarts. Locators now join as members before being promoted to coordinator.
> 
> 
> Diffs
> -----
> 
>   gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/NetView.java b0ddcc09190f3a91fcea860f28efe6f2c52ea3ad 
>   gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java 0f16ba97df7396db174dde560cb529709aeb8e43 
>   gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/InstallViewMessage.java 91f691898fe3ec21013453787b27444dd978c7dd 
>   gemfire-core/src/main/java/com/gemstone/gemfire/internal/DSFIDFactory.java b77dfdb4e2554193ed7cfa86470767ee94e32e47 
>   gemfire-core/src/main/java/com/gemstone/gemfire/internal/DataSerializableFixedID.java 7b263bfb6342c09b3075d7562268cf0687c80710 
>   gemfire-core/src/test/java/com/gemstone/gemfire/distributed/LocatorDUnitTest.java 545a0eab1ef189b384fba41b4d97abb3fe60e78a 
>   gemfire-core/src/test/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeaveHelper.java PRE-CREATION 
>   gemfire-core/src/test/java/com/gemstone/gemfire/test/dunit/SerializableRunnable.java 353cdc750fdc299d592e4248432945e14d6f77d0 
>   gradle/rat.gradle f7826657e4133a06963149756f8b9a2a5e3d477e 
> 
> Diff: https://reviews.apache.org/r/43400/diff/
> 
> 
> Testing
> -------
> 
> LocatorDUnitTest.testMultipleLocatorsRestartingAtSameTime - This replicates the problem and passes.
> 
> 
> Thanks,
> 
> Udo Kohlmeyer
> 
>