You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2022/01/17 12:32:00 UTC

[jira] [Commented] (HDDS-6191) Intermittent failure in TestDeleteWithSlowFollower

    [ https://issues.apache.org/jira/browse/HDDS-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477168#comment-17477168 ] 

Attila Doroszlai commented on HDDS-6191:
----------------------------------------

{{c7514957}} votes for {{18b95dfb}}:

{noformat}
2022-01-07 01:24:07,792 [grpc-default-executor-0] INFO  server.RaftServer$Division (RaftServerImpl.java:requestVote(1046)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075: receive requestVote(ELECTION, 18b95dfb-8ef7-4011-a842-0dc8d0be8521, group-9EECFC4A6075, 2, (t:0, i:0))

2022-01-07 01:24:07,792 [grpc-default-executor-0] INFO  impl.VoteContext (VoteContext.java:log(48)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FOLLOWER: accept ELECTION from 18b95dfb-8ef7-4011-a842-0dc8d0be8521: our priority 0 <= candidate's priority 1
{noformat}

but just after 3ms starts new election due to RPC timeout:

{noformat}
2022-01-07 01:24:07,795 [c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO  impl.FollowerState (FollowerState.java:run(134)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState: change to CANDIDATE, lastRpcElapsedTime:5172276476ns, electionTimeout:5172ms

2022-01-07 01:24:07,796 [c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO  server.RaftServer$Division (RaftServerImpl.java:setRole(290)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075: changes role from  FOLLOWER to CANDIDATE at term 2 for changeToCandidate

2022-01-07 01:24:07,805 [c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-LeaderElection7] INFO  impl.LeaderElection (LeaderElection.java:askForVotes(306)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-LeaderElection7 ELECTION round 0: submit vote requests at term 3 for -1: [18b95dfb-8ef7-4011-a842-0dc8d0be8521|rpc:10.1.0.7:39307|priority:1, c7514957-7287-4cfe-8c11-55f780daee7f|rpc:10.1.0.7:42065|priority:0, baceac6d-bbfe-42b4-9bde-396690ef56d9|rpc:10.1.0.7:38573|priority:0], old=null
{noformat}

Meanwhile {{18b95dfb}} becomes leader with vote from {{c7514957}}:

{noformat}
2022-01-07 01:24:07,805 [18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO  impl.LeaderElection (LeaderElection.java:logAndReturn(89)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6: ELECTION PASSED received 1 response(s) and 0 exception(s):

2022-01-07 01:24:07,814 [18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO  server.RaftServer$Division (RaftServerImpl.java:setRole(290)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: changes role from CANDIDATE to LEADER at term 2 for changeToLeader

2022-01-07 01:24:07,814 [18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO  ratis.XceiverServerRatis (XceiverServerRatis.java:handleLeaderChangedNotification(866)) - Leader change notification received for group: group-9EECFC4A6075 with new leaderId: 18b95dfb-8ef7-4011-a842-0dc8d0be8521

2022-01-07 01:24:07,815 [18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO  server.RaftServer$Division (ServerState.java:setLeader(285)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: change Leader from null to 18b95dfb-8ef7-4011-a842-0dc8d0be8521 at term 2 for becomeLeader, leader elected after 9549ms
{noformat}

so pipeline is opened and SCM exits safe mode:

{noformat}
2022-01-07 01:24:07,818 [EventQueue-PipelineReportForPipelineReportHandler] INFO  pipeline.PipelineManagerImpl (PipelineManagerImpl.java:openPipeline(291)) - Pipeline Pipeline[ Id: b1f049ef-67e6-4b7d-89d8-9eecfc4a6075, ... leaderId:18b95dfb-8ef7-4011-a842-0dc8d0be8521, ... moved to OPEN state
...
2022-01-07 01:24:07,819 [EventQueue-OpenPipelineForHealthyPipelineSafeModeRule] INFO  safemode.SCMSafeModeManager (SCMSafeModeManager.java:exitSafeMode(248)) - SCM exiting safe mode.
{noformat}

But {{c7514957}}'s vote request makes {{18b95dfb}} step down from leadership:

{noformat}
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO  impl.VoteContext (VoteContext.java:log(48)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LEADER: reject ELECTION from c7514957-7287-4cfe-8c11-55f780daee7f: our last entry (t:2, i:0) > candidate's last entry (t:0, i:0)
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO  server.RaftServer$Division (ServerState.java:setLeader(285)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: change Leader from 18b95dfb-8ef7-4011-a842-0dc8d0be8521 to null at term 3 for updateCurrentTerm
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO  server.RaftServer$Division (RaftServerImpl.java:setRole(290)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: changes role from    LEADER to FOLLOWER at term 3 for candidate:c7514957-7287-4cfe-8c11-55f780daee7f
{noformat}

The result is that pipeline is open without leader, which is normally OK (as client should retry until a new leader is elected).  But the test makes an assertion about leader being present, since it has waited for the pipeline to open.

> Intermittent failure in TestDeleteWithSlowFollower
> --------------------------------------------------
>
>                 Key: HDDS-6191
>                 URL: https://issues.apache.org/jira/browse/HDDS-6191
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Attila Doroszlai
>            Assignee: Attila Doroszlai
>            Priority: Major
>
> {code}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 45.297 s <<< FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower
> testDeleteKeyWithSlowFollower  Time elapsed: 0.193 s  <<< FAILURE!
> java.lang.AssertionError
> 	at org.junit.Assert.fail(Assert.java:87)
> 	at org.junit.Assert.assertTrue(Assert.java:42)
> 	at org.junit.Assert.assertNotNull(Assert.java:713)
> 	at org.junit.Assert.assertNotNull(Assert.java:723)
> 	at org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.testDeleteKeyWithSlowFollower(TestDeleteWithSlowFollower.java:237)
> {code}
> This was originally reported in HDDS-3762.  Then it seemed to be OK for extended period, but appeared again:
> https://github.com/elek/ozone-build-results/blob/master/2021/04/30/7624/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/05/19/7973/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/07/15/8962/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/08/23/9889/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/07/10146/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/09/10201/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/13/10242/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2022/01/07/12409/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org