You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "runzhiwang (Jira)" <ji...@apache.org> on 2020/04/21 13:32:00 UTC
[jira] [Resolved] (RATIS-859) Infinite leader election in ozone
[ https://issues.apache.org/jira/browse/RATIS-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
runzhiwang resolved RATIS-859.
------------------------------
Resolution: Not A Problem
> Infinite leader election in ozone
> ---------------------------------
>
> Key: RATIS-859
> URL: https://issues.apache.org/jira/browse/RATIS-859
> Project: Ratis
> Issue Type: Bug
> Reporter: runzhiwang
> Assignee: runzhiwang
> Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png, screenshot-7.png
>
>
> I also open the same jira in ozone: https://issues.apache.org/jira/browse/HDDS-3459. I think both ozone and ratis should avoid this happens.
> *What's the problem ?*
> There are 3 datanodes in a group: leader, follower1, follower2. Steps to reproduce the problem are as following:
> 1. follower2 report close pipeline
> 2. scm send close pipeline command
> 3. leader and follower1 remove group, but follower2 socket timeout and does not remove group
> 4. follower2 then begin infinite LeaderElection at least 6 hours, leader and follower1 response group not found
> You can see find it in following screenshot.
> 1. follower2 report close pipeline
> !screenshot-1.png!
> 2. Scm close pipeline:
> !screenshot-2.png!
> !screenshot-3.png!
> 3. leader remove group
> !screenshot-4.png!
> follower1 remove group
> !screenshot-5.png!
> follower2 socket timeout
> !screenshot-6.png!
> 4. follower2 then begin infinite LeaderElection at least 6 hours
> !screenshot-7.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)