You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/12/09 01:13:00 UTC

[jira] [Commented] (GEODE-9808) Client ops fail with NoLocatorsAvailableException when all servers leave the DS

    [ https://issues.apache.org/jira/browse/GEODE-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456072#comment-17456072 ] 

ASF subversion and git services commented on GEODE-9808:
--------------------------------------------------------

Commit 1e66771a546462e89b6e11aaef294fb0e05d524c in geode's branch refs/heads/develop from Donal Evans
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=1e66771 ]

GEODE-9808: Throw appropriate exception in AutoConnectionSourceImpl (#7143)

 - Throw NoServersFoundException instead of NoLocatorsFoundException in 
AutoConnectionSourceImpl if queryLocators() returns a response with no result 
 - Refactor and fix up AutoConnectionSourceImplJUnitTest
 - Modify tests in AutoConnectionSourceImplJUnitTest to cover new
 behaviour

Authored-by: Donal Evans <do...@vmware.com>

> Client ops fail with NoLocatorsAvailableException when all servers leave the DS 
> --------------------------------------------------------------------------------
>
>                 Key: GEODE-9808
>                 URL: https://issues.apache.org/jira/browse/GEODE-9808
>             Project: Geode
>          Issue Type: Bug
>          Components: client/server
>    Affects Versions: 1.15.0
>            Reporter: Bill Burcham
>            Assignee: Donal Evans
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>
> When there are no cache servers (only locators) in a cluster, client operations will fail with a misleading exception:
> {noformat}
> org.apache.geode.cache.client.NoAvailableLocatorsException: Unable to connect to any locators in the list [gemfire-cluster-locator-0.gemfire-cluster-locator.namespace-1850250019.svc.cluster.local:10334, gemfire-cluster-locator-1.gemfire-cluster-locator.namespace-1850250019.svc.cluster.local:10334, gemfire-cluster-locator-2.gemfire-cluster-locator.namespace-1850250019.svc.cluster.local:10334]
>     at org.apache.geode.cache.client.internal.AutoConnectionSourceImpl.findServer(AutoConnectionSourceImpl.java:174)
>     at org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createClientToServerConnection(ConnectionFactoryImpl.java:211)
>     at org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:196)
>     at org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.forceCreateConnection(ConnectionManagerImpl.java:227)
>     at org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.exchangeConnection(ConnectionManagerImpl.java:365)
>     at org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:161)
>     at org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:120)
>     at org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:805)
>     at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:91)
> {noformat}
> Even the client is able to connect to a locator, we encounter a NoAvailableLocatorsException exception with the message "Unable to connect to any locators in the list".
> Investigating the product code we see:
>  # If there are no cache servers in the cluster, ServerLocator.pickServer() will definitely construct a ClientConnectionResponse(null) which causes that object’s hasResult() to respond with false in the loop termination in AutoConnectionSourceImpl.queryLocators()
>  # Not only is the exception wording misleading in AutoConnectionSourceImpl.findServer()—it’s also misleading in at least two other calling locations in AutoConnectionSourceImpl: findReplacementServer() and findServersForQueue().
>  # In each of those cases the calling method translates a null response from queryLocators() into a throw of a NoAvailableLocatorsException
>  # an appropriate exception, NoAvailableServersException, already exists, for the case where we were able to contact a locator but the locator was not able to find any cache servers
>  # According to my Git spelunking queryLocators() has been obfuscating the true cause of the failure since at least 2015
> Without analyzing ServerLocator.pickServer() (LocatorLoadSnapshot.getServerForConnection()) to discern why two locators might disagree on how many cache servers are in the cluster, it seems to me that we should modify AutoConnectionSourceImpl.queryLocators() so that:
>  * if it gets a ServerLocationResponse with hasResult() true, it immediately returns that as it does now
>  * otherwise it keeps trying and it keeps track of the last (non-null) ServerLocationResponse it has received
>  * it returns the last non-null ServerLocationResponse it received (otherwise it returns null)
> With that in hand, we can change the three call locations in AutoConnectionSourceImpl: findServer(), findReplacementServer(), and findServersForQueue() to each throw NoAvailableLocatorsException if no locator responded, or NoAvailableServersException if a locator responded with a ClientConnectionResponse for which hasResult() returns null.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)