You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Bruce Schuchardt (JIRA)" <ji...@apache.org> on 2017/06/09 17:24:18 UTC
[jira] [Reopened] (GEODE-3052) Restarting 2 locators within 1s of
each other causes potential locator split brain
[ https://issues.apache.org/jira/browse/GEODE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bruce Schuchardt reopened GEODE-3052:
-------------------------------------
If there were servers in the recovered view locators can still come up in a split-brain configuration with the fix I checked in yesterday.
{noformat}
locator1/locator1.log: [info 2017/06/09 10:13:29.365 PDT locator1 <main> tid=0x1] Peer locator recovering from /export/trout1/users/bschuchardt/devel/testing/splitbrain/locator1/locator12345view.dat
locator1/locator1.log: [info 2017/06/09 10:13:29.367 PDT locator1 <main> tid=0x1] Peer locator initial membership is View[trout(locator1:9450:locator)<ec><v0>:1024|3] members: [trout(server2:9684)<v2>:1026{lead}, trout(server1:9796)<v3>:1027]
locator2/locator2.log: [info 2017/06/09 10:13:30.093 PDT locator2 <main> tid=0x1] Attempting to join the distributed system through coordinator 10.118.26.122(server2:9684)<v2>:1026 using address 10.118.26.122(locator2:9961:locator)<ec>:1025
locator1/locator1.log: [info 2017/06/09 10:13:38.678 PDT locator1 <main> tid=0x1] This member is becoming the membership coordinator with address 10.118.26.122(locator1:9937:locator)<ec>:1024
locator1/locator1.log: [info 2017/06/09 10:13:38.679 PDT locator1 <main> tid=0x1] received new view: View[10.118.26.122(locator1:9937:locator)<ec><v0>:1024|0] members: [10.118.26.122(locator1:9937:locator)<ec><v0>:1024]
old view is: null
locator1/locator1.log: [info 2017/06/09 10:13:38.679 PDT locator1 <main> tid=0x1] Peer locator received new membership view: View[10.118.26.122(locator1:9937:locator)<ec><v0>:1024|0] members: [10.118.26.122(locator1:9937:locator)<ec><v0>:1024]
locator1/locator1.log: [info 2017/06/09 10:13:38.692 PDT locator1 <main> tid=0x1] ViewCreator starting on:10.118.26.122(locator1:9937:locator)<ec><v0>:1024
locator1/locator1.log: [info 2017/06/09 10:13:38.692 PDT locator1 <Geode Membership View Creator> tid=0x24] View Creator thread is starting
locator1/locator1.log: [info 2017/06/09 10:13:38.693 PDT locator1 <main> tid=0x1] Finished joining (took 9030ms).
locator1/locator1.log: [info 2017/06/09 10:13:38.694 PDT locator1 <Geode Membership View Creator> tid=0x24] no recipients for new view aside from myself
locator1/locator1.log: [info 2017/06/09 10:13:38.695 PDT locator1 <main> tid=0x1] Starting DistributionManager 10.118.26.122(locator1:9937:locator)<ec><v0>:1024. (took 9250 ms)
locator1/locator1.log: [info 2017/06/09 10:13:38.697 PDT locator1 <main> tid=0x1] Initial (distribution manager) view = View[10.118.26.122(locator1:9937:locator)<ec><v0>:1024|0] members: [10.118.26.122(locator1:9937:locator)<ec><v0>:1024]
locator1/locator1.log: [info 2017/06/09 10:13:38.697 PDT locator1 <main> tid=0x1] Admitting member <10.118.26.122(locator1:9937:locator)<ec><v0>:1024>. Now there are 1 non-admin member(s).
locator1/locator1.log: [info 2017/06/09 10:13:38.697 PDT locator1 <main> tid=0x1] 10.118.26.122(locator1:9937:locator)<ec><v0>:1024 is the elder and the only member.
locator1/locator1.log: [info 2017/06/09 10:13:38.700 PDT locator1 <main> tid=0x1] Did not hear back from any other system. I am the first one.
locator1/locator1.log: [info 2017/06/09 10:13:38.726 PDT locator1 <main> tid=0x1] Creating cache for locator.
locator1/locator1.log: [info 2017/06/09 10:13:38.895 PDT locator1 <main> tid=0x1] Requesting cluster configuration
locator1/locator1.log: [info 2017/06/09 10:13:38.982 PDT locator1 <main> tid=0x1] Initializing region _monitoringRegion_10.118.26.122<v0>1024
locator1/locator1.log: [info 2017/06/09 10:13:38.986 PDT locator1 <main> tid=0x1] Initialization of region _monitoringRegion_10.118.26.122<v0>1024 completed
locator2/locator2.log: [info 2017/06/09 10:13:39.103 PDT locator2 <main> tid=0x1] This member is becoming the membership coordinator with address 10.118.26.122(locator2:9961:locator)<ec>:1025
locator2/locator2.log: [info 2017/06/09 10:13:39.103 PDT locator2 <main> tid=0x1] received new view: View[10.118.26.122(locator2:9961:locator)<ec><v0>:1025|0] members: [10.118.26.122(locator2:9961:locator)<ec><v0>:1025]
old view is: null
locator2/locator2.log: [info 2017/06/09 10:13:39.104 PDT locator2 <main> tid=0x1] Peer locator received new membership view: View[10.118.26.122(locator2:9961:locator)<ec><v0>:1025|0] members: [10.118.26.122(locator2:9961:locator)<ec><v0>:1025]
locator2/locator2.log: [info 2017/06/09 10:13:39.117 PDT locator2 <main> tid=0x1] ViewCreator starting on:10.118.26.122(locator2:9961:locator)<ec><v0>:1025
locator2/locator2.log: [info 2017/06/09 10:13:39.118 PDT locator2 <main> tid=0x1] Finished joining (took 9033ms).
locator2/locator2.log: [info 2017/06/09 10:13:39.119 PDT locator2 <main> tid=0x1] Starting DistributionManager 10.118.26.122(locator2:9961:locator)<ec><v0>:1025. (took 9247 ms)
locator2/locator2.log: [info 2017/06/09 10:13:39.120 PDT locator2 <Geode Membership View Creator> tid=0x24] View Creator thread is starting
locator2/locator2.log: [info 2017/06/09 10:13:39.121 PDT locator2 <Geode Membership View Creator> tid=0x24] no recipients for new view aside from myself
{noformat}
> Restarting 2 locators within 1s of each other causes potential locator split brain
> ----------------------------------------------------------------------------------
>
> Key: GEODE-3052
> URL: https://issues.apache.org/jira/browse/GEODE-3052
> Project: Geode
> Issue Type: Bug
> Components: locator
> Affects Versions: 1.1.1
> Reporter: Udo Kohlmeyer
> Assignee: Bruce Schuchardt
> Fix For: 1.2.0
>
>
> Using the artifacts from GEODE-3003, it is possible to cause a locator split brain upon locator startup. This seems to only happen when the locators start within 1s of each other, i.e <1s.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)