You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@geode.apache.org by "bschuchardt (GitHub)" <gi...@apache.org> on 2019/01/24 18:29:12 UTC

[GitHub] [geode] bschuchardt opened pull request #3118: GEODE-6309 ClusterConfigLocatorRestartDUnitTest fails to spin up a new server

This modifies auto-reconnect to lengthen the time a Locator will attempt
to join from 24 seconds to 60 seconds and prevents the Locator from
creating its own cluster (which would form a split-brain).  In an
auto-reconnect attempt the location service will not start up until a
quorum of the old cluster can be contacted, meaning that some process
that's still in the cluster exists and should have taken over the role
of membership coordinator.  The locator needs to join using that
coordinator and not create its own cluster.

This also corrects the handling of the old membership view in
GMSLocator.  The restarted location service was incorrectly using this
old view as an authority on who had the role of coordinator but it
should only be used as a hint.  This is done by putting the view into
the recoveredView variable and assigning it an invalid viewID.

In real applications this bug isn't likely to be encountered because the
first auto-reconnect attempt doesn't take place for a minute.  The
ClusterStartupRule modifies this default to start reconnecting in 5
seconds, which wasn't giving the cluster enough time to react to the
loss of the old Locator and assign a new membership coordinator.
With these changes the test passes even if the default is reduced to 1
second.

Finally, the test was incorrectly using internal APIs to detect whether
the Locator had successfully reconnected. I fixed some of that but
opened GEODE-6312 to track the problem that stopping the old Locator did
not actually stop its cluster configuration service.

Thank you for submitting a contribution to Apache Geode.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

- [x] Has your PR been rebased against the latest commit within the target branch (typically `develop`)?

- [x] Is your initial contribution a single, squashed commit?

- [x] Does `gradlew build` run cleanly?

- [n/a] Have you written or updated unit tests to verify your changes?

- [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and
submit an update to your PR as soon as possible. If you need help, please send an
email to dev@geode.apache.org.


[ Full content available at: https://github.com/apache/geode/pull/3118 ]
This message was relayed via gitbox.apache.org for notifications@geode.apache.org

[GitHub] [geode] bschuchardt closed pull request #3118: GEODE-6309 ClusterConfigLocatorRestartDUnitTest fails to spin up a new server

Posted by "bschuchardt (GitHub)" <gi...@apache.org>.
[ pull request closed by bschuchardt ]

[ Full content available at: https://github.com/apache/geode/pull/3118 ]
This message was relayed via gitbox.apache.org for notifications@geode.apache.org