You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2016/05/16 15:09:13 UTC

[jira] [Commented] (GEODE-1393) locator returns incorrect server information when starting up

    [ https://issues.apache.org/jira/browse/GEODE-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284701#comment-15284701 ] 

ASF subversion and git services commented on GEODE-1393:
--------------------------------------------------------

Commit 6523c97c92f607746d80b11c7cb5315b1137f5a2 in incubator-geode's branch refs/heads/develop from [~bschuchardt]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=6523c97 ]

GEODE-1393 locator returns incorrect server information when starting up

When a locator auto-reconnects its ServerLocator needs to initialize its
ControllerAdvisor so that it has server information to give to clients.
The ServerLocator was creating a new ControllerAdvisor but didn't ask it
to perform a handshake to fill in its profiles.

ReconnectDUnitTest had an existing testReconnectWithQuorum test that
wasn't doing what it was supposed to.  I've removed the TODO from that
test and modified it to force-disconnect the tests Locator.  The
locator must restart its TcpServer component before it can start
a DistributedSystem, so this exercises the path in
InternalLocator.attemptReconnect() that boots the TcpServer prior to
connecting the DistributedSystem.  After the DistributedSystem
finishes reconnecting the ServerLocator's distribution advisor
should have been initialized by performing the handshake.


> locator returns incorrect server information when starting up
> -------------------------------------------------------------
>
>                 Key: GEODE-1393
>                 URL: https://issues.apache.org/jira/browse/GEODE-1393
>             Project: Geode
>          Issue Type: Bug
>          Components: locator
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>
> When starting up a locator has no knowledge of cache servers that might be in the distributed system but it will process server-location requests from clients and return them incorrect information until it receives load info from the servers.
> In one test I saw a locator be ejected from the distributed system.  When it auto-reconnected some cache clients asked it for server locations and, though there were 6 cache servers available the clients got this exception:
> {noformat}
> com.gemstone.gemfire.cache.client.NoAvailableServersException
>         at com.gemstone.gemfire.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:257)
>         at com.gemstone.gemfire.cache.client.internal.OpExecutorImpl.getNextOpServerLocation(OpExecutorImpl.java:318)
>         at com.gemstone.gemfire.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:130)
>         at com.gemstone.gemfire.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:123)
>         at com.gemstone.gemfire.cache.client.internal.PoolImpl.execute(PoolImpl.java:714)
>         at com.gemstone.gemfire.cache.client.internal.GetOp.execute(GetOp.java:97)
>         at com.gemstone.gemfire.cache.client.internal.ServerRegionProxy.get(ServerRegionProxy.java:112)
>         at com.gemstone.gemfire.internal.cache.tx.ClientTXRegionStub.findObject(ClientTXRegionStub.java:72)
>         at com.gemstone.gemfire.internal.cache.TXStateStub.findObject(TXStateStub.java:379)
>         at com.gemstone.gemfire.internal.cache.TXStateProxyImpl.findObject(TXStateProxyImpl.java:607)
>         at com.gemstone.gemfire.internal.cache.LocalRegion.get(LocalRegion.java:1460)
>         at com.gemstone.gemfire.internal.cache.LocalRegion.get(LocalRegion.java:1398)
>         at com.gemstone.gemfire.internal.cache.LocalRegion.get(LocalRegion.java:1385)
>         at com.gemstone.gemfire.internal.cache.AbstractRegion.get(AbstractRegion.java:336)
> {noformat}
> ServerLocator has a readiness check but it is only testing to see if its DistributedSystem instance variable has been initialized.  It ought to wait until it has received a server load update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)