You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Bruce Schuchardt (JIRA)" <ji...@apache.org> on 2019/03/28 17:43:00 UTC

[jira] [Created] (GEODE-6570) processing of cached join request delays view installation

Bruce Schuchardt created GEODE-6570:
---------------------------------------

             Summary: processing of cached join request delays view installation
                 Key: GEODE-6570
                 URL: https://issues.apache.org/jira/browse/GEODE-6570
             Project: Geode
          Issue Type: Bug
          Components: membership
            Reporter: Bruce Schuchardt


In a test that kills and restarts locators one of the restarting locators times out trying to join the distributed system.  Logs show that another locator was becoming the membership coordinator and was delayed in sending out a membership view when it processed a different join request for a member that was already in the distributed system.

locator A gets join request from node 1 and sends a PREPARE

node 1 sets its identity's view ID using the PREPAREd view

locator A is killed

node 1 sends a join request to locator B.  Its identity has a view ID set.

node 2 sends a join request to locator B and gets a PREPARE

locator B processes node 1's join request and assigns a new view ID to it

locator B processes node 2's join request and assigns a new view ID to it

locator B sends the PREPARE with these two new nodes.  It also has node 1's original ID

locator B times out waiting for a response from node 1 with the new view ID and declares it crashed.  It sends out a new PREPARE w/o that address.

node 2 gives up waiting

locator B gets no response from node 2 and declares it crashed, sends out a new PREPARE without node 2 and succeeds.

Here are log snippets showing the problem.  Process 616 has a JoinRequest queued when this locator becomes coordinator.  The JoinRequest ID has v46 already in it, showing that a PREPARE has already been sent with this member in it.

The locator then creates a new View that has process 616's ID in it twice - once with v46 and once with v60
{noformat}
locatorgemfire_2_2_29835/system.log: [fine 2019/03/27 22:22:22.817 PDT locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] processing request JoinRequestMessage(rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v46>:41004) failureDetectionPort:43747
locatorgemfire_2_2_29835/system.log: [fine 2019/03/27 22:22:22.817 PDT locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] processing request JoinRequestMessage(rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_3_host2_746:746:locator)<ec>:41002) failureDetectionPort:52188

locatorgemfire_2_2_29835/system.log: [info 2019/03/27 22:22:22.818 PDT locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] preparing new view View[rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_2_host2_29835:29835:locator)<ec><v24>:41001|60] members: [rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_2_host2_29835:29835:locator)<ec><v24>:41001, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_2_host2_30052:30052)<ec><v25>:41007{lead}, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_4_host2_31300:31300:locator)<ec><v29>:41003, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_1_host2_31671:31671:locator)<ec><v41>:41000, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_2_host2_31856:31856)<ec><v42>:41006, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_32560:32560)<ec><v44>:41005, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v46>:41004, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v60>:41004, rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_3_host2_746:746:locator)<ec><v60>:41002]

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)