You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2022/01/17 12:32:00 UTC
[jira] [Commented] (HDDS-6191) Intermittent failure in TestDeleteWithSlowFollower
[ https://issues.apache.org/jira/browse/HDDS-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477168#comment-17477168 ]
Attila Doroszlai commented on HDDS-6191:
----------------------------------------
{{c7514957}} votes for {{18b95dfb}}:
{noformat}
2022-01-07 01:24:07,792 [grpc-default-executor-0] INFO server.RaftServer$Division (RaftServerImpl.java:requestVote(1046)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075: receive requestVote(ELECTION, 18b95dfb-8ef7-4011-a842-0dc8d0be8521, group-9EECFC4A6075, 2, (t:0, i:0))
2022-01-07 01:24:07,792 [grpc-default-executor-0] INFO impl.VoteContext (VoteContext.java:log(48)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FOLLOWER: accept ELECTION from 18b95dfb-8ef7-4011-a842-0dc8d0be8521: our priority 0 <= candidate's priority 1
{noformat}
but just after 3ms starts new election due to RPC timeout:
{noformat}
2022-01-07 01:24:07,795 [c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO impl.FollowerState (FollowerState.java:run(134)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState: change to CANDIDATE, lastRpcElapsedTime:5172276476ns, electionTimeout:5172ms
2022-01-07 01:24:07,796 [c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO server.RaftServer$Division (RaftServerImpl.java:setRole(290)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075: changes role from FOLLOWER to CANDIDATE at term 2 for changeToCandidate
2022-01-07 01:24:07,805 [c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-LeaderElection7] INFO impl.LeaderElection (LeaderElection.java:askForVotes(306)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-LeaderElection7 ELECTION round 0: submit vote requests at term 3 for -1: [18b95dfb-8ef7-4011-a842-0dc8d0be8521|rpc:10.1.0.7:39307|priority:1, c7514957-7287-4cfe-8c11-55f780daee7f|rpc:10.1.0.7:42065|priority:0, baceac6d-bbfe-42b4-9bde-396690ef56d9|rpc:10.1.0.7:38573|priority:0], old=null
{noformat}
Meanwhile {{18b95dfb}} becomes leader with vote from {{c7514957}}:
{noformat}
2022-01-07 01:24:07,805 [18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO impl.LeaderElection (LeaderElection.java:logAndReturn(89)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6: ELECTION PASSED received 1 response(s) and 0 exception(s):
2022-01-07 01:24:07,814 [18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO server.RaftServer$Division (RaftServerImpl.java:setRole(290)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: changes role from CANDIDATE to LEADER at term 2 for changeToLeader
2022-01-07 01:24:07,814 [18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO ratis.XceiverServerRatis (XceiverServerRatis.java:handleLeaderChangedNotification(866)) - Leader change notification received for group: group-9EECFC4A6075 with new leaderId: 18b95dfb-8ef7-4011-a842-0dc8d0be8521
2022-01-07 01:24:07,815 [18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LeaderElection6] INFO server.RaftServer$Division (ServerState.java:setLeader(285)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: change Leader from null to 18b95dfb-8ef7-4011-a842-0dc8d0be8521 at term 2 for becomeLeader, leader elected after 9549ms
{noformat}
so pipeline is opened and SCM exits safe mode:
{noformat}
2022-01-07 01:24:07,818 [EventQueue-PipelineReportForPipelineReportHandler] INFO pipeline.PipelineManagerImpl (PipelineManagerImpl.java:openPipeline(291)) - Pipeline Pipeline[ Id: b1f049ef-67e6-4b7d-89d8-9eecfc4a6075, ... leaderId:18b95dfb-8ef7-4011-a842-0dc8d0be8521, ... moved to OPEN state
...
2022-01-07 01:24:07,819 [EventQueue-OpenPipelineForHealthyPipelineSafeModeRule] INFO safemode.SCMSafeModeManager (SCMSafeModeManager.java:exitSafeMode(248)) - SCM exiting safe mode.
{noformat}
But {{c7514957}}'s vote request makes {{18b95dfb}} step down from leadership:
{noformat}
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO impl.VoteContext (VoteContext.java:log(48)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075-LEADER: reject ELECTION from c7514957-7287-4cfe-8c11-55f780daee7f: our last entry (t:2, i:0) > candidate's last entry (t:0, i:0)
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO server.RaftServer$Division (ServerState.java:setLeader(285)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: change Leader from 18b95dfb-8ef7-4011-a842-0dc8d0be8521 to null at term 3 for updateCurrentTerm
2022-01-07 01:24:07,868 [grpc-default-executor-0] INFO server.RaftServer$Division (RaftServerImpl.java:setRole(290)) - 18b95dfb-8ef7-4011-a842-0dc8d0be8521@group-9EECFC4A6075: changes role from LEADER to FOLLOWER at term 3 for candidate:c7514957-7287-4cfe-8c11-55f780daee7f
{noformat}
The result is that pipeline is open without leader, which is normally OK (as client should retry until a new leader is elected). But the test makes an assertion about leader being present, since it has waited for the pipeline to open.
> Intermittent failure in TestDeleteWithSlowFollower
> --------------------------------------------------
>
> Key: HDDS-6191
> URL: https://issues.apache.org/jira/browse/HDDS-6191
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Attila Doroszlai
> Assignee: Attila Doroszlai
> Priority: Major
>
> {code}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 45.297 s <<< FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower
> testDeleteKeyWithSlowFollower Time elapsed: 0.193 s <<< FAILURE!
> java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:87)
> at org.junit.Assert.assertTrue(Assert.java:42)
> at org.junit.Assert.assertNotNull(Assert.java:713)
> at org.junit.Assert.assertNotNull(Assert.java:723)
> at org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.testDeleteKeyWithSlowFollower(TestDeleteWithSlowFollower.java:237)
> {code}
> This was originally reported in HDDS-3762. Then it seemed to be OK for extended period, but appeared again:
> https://github.com/elek/ozone-build-results/blob/master/2021/04/30/7624/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/05/19/7973/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/07/15/8962/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/08/23/9889/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/07/10146/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/09/10201/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2021/09/13/10242/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
> https://github.com/elek/ozone-build-results/blob/master/2022/01/07/12409/it-client/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.txt
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org