You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Bruce Schuchardt <bs...@pivotal.io> on 2016/12/28 17:15:53 UTC

Review Request 55074: GEODE-2253 Locator may fail to respond to a valid request

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55074/
-----------------------------------------------------------

Review request for geode, Udo Kohlmeyer and Dan Smith.


Bugs: GEODE-2238 and GEODE-2253
    https://issues.apache.org/jira/browse/GEODE-2238
    https://issues.apache.org/jira/browse/GEODE-2253


Repository: geode


Description
-------

__Background__

Initially the Locator had but one service - peer-location.

When connection pools were added to client caches a server-location service was added to the locator, which boots up after the peer location service and isn't actually available until after StartupMessages have been exchanged.  During the interval between joining and StartupMessage exchange the handler for server-location messages isn't available and the Locator used to log a warning.  I fixed that by supressing the warning if we knew that server-location was going to be enabled.

Now a Cluster Configuration service has been added that isn't initialized and available until some time after a Cache has been created in the Locator, which is an even longer gap than the server-location service had.  During this time a new member might join and try to get the shared configuration from the locator.  The locator then logs a warning and does not respond, so the new member ends up shutting down.

__The Fix__

This introduces a retry loop in the locator when a handler for an incoming message can't be found.  It waits for an amount of time for the handler to be installed, which I've set to the locator-wait-time or 5 seconds if that property hasn't been set.

I've also changed InternalLocator to always install the handler for cluster configuration status so that you can query any locator to see if it has a cluster configuration service and, if so, what state it's in.


Diffs
-----

  geode-core/src/main/java/org/apache/geode/distributed/internal/InternalLocator.java 59488ad66e861fde67840f75dae413f247c821f4 
  geode-core/src/main/java/org/apache/geode/distributed/internal/tcpserver/TcpServer.java 83fdd0b1697f541034a91e02e1e0c96de4493cd0 
  geode-core/src/test/java/org/apache/geode/distributed/LocatorJUnitTest.java 65f09476ff06542739c63147dab29b71d5550dd0 

Diff: https://reviews.apache.org/r/55074/diff/


Testing
-------

precheckin, new unit test


Thanks,

Bruce Schuchardt


Re: Review Request 55074: GEODE-2253 Locator may fail to respond to a valid request

Posted by Udo Kohlmeyer <uk...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55074/#review160436
-----------------------------------------------------------


Ship it!




Ship It!

- Udo Kohlmeyer


On Dec. 28, 2016, 5:15 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55074/
> -----------------------------------------------------------
> 
> (Updated Dec. 28, 2016, 5:15 p.m.)
> 
> 
> Review request for geode, Udo Kohlmeyer and Dan Smith.
> 
> 
> Bugs: GEODE-2238 and GEODE-2253
>     https://issues.apache.org/jira/browse/GEODE-2238
>     https://issues.apache.org/jira/browse/GEODE-2253
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> __Background__
> 
> Initially the Locator had but one service - peer-location.
> 
> When connection pools were added to client caches a server-location service was added to the locator, which boots up after the peer location service and isn't actually available until after StartupMessages have been exchanged.  During the interval between joining and StartupMessage exchange the handler for server-location messages isn't available and the Locator used to log a warning.  I fixed that by supressing the warning if we knew that server-location was going to be enabled.
> 
> Now a Cluster Configuration service has been added that isn't initialized and available until some time after a Cache has been created in the Locator, which is an even longer gap than the server-location service had.  During this time a new member might join and try to get the shared configuration from the locator.  The locator then logs a warning and does not respond, so the new member ends up shutting down.
> 
> __The Fix__
> 
> This introduces a retry loop in the locator when a handler for an incoming message can't be found.  It waits for an amount of time for the handler to be installed, which I've set to the locator-wait-time or 5 seconds if that property hasn't been set.
> 
> I've also changed InternalLocator to always install the handler for cluster configuration status so that you can query any locator to see if it has a cluster configuration service and, if so, what state it's in.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/InternalLocator.java 59488ad66e861fde67840f75dae413f247c821f4 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/tcpserver/TcpServer.java 83fdd0b1697f541034a91e02e1e0c96de4493cd0 
>   geode-core/src/test/java/org/apache/geode/distributed/LocatorJUnitTest.java 65f09476ff06542739c63147dab29b71d5550dd0 
> 
> Diff: https://reviews.apache.org/r/55074/diff/
> 
> 
> Testing
> -------
> 
> precheckin, new unit test
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>