You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "runzhiwang (Jira)" <ji...@apache.org> on 2020/04/21 13:32:00 UTC

[jira] [Resolved] (RATIS-859) Infinite leader election in ozone

     [ https://issues.apache.org/jira/browse/RATIS-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

runzhiwang resolved RATIS-859.
------------------------------
    Resolution: Not A Problem

> Infinite leader election in ozone
> ---------------------------------
>
>                 Key: RATIS-859
>                 URL: https://issues.apache.org/jira/browse/RATIS-859
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Major
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png, screenshot-7.png
>
>
> I also open the same jira in ozone: https://issues.apache.org/jira/browse/HDDS-3459. I think both ozone and ratis should avoid this happens.
> *What's the problem ?*
> There are 3 datanodes in a group: leader, follower1, follower2. Steps to reproduce the problem are as following:
> 1. follower2 report close pipeline
> 2. scm send close pipeline command
> 3. leader and follower1 remove group, but follower2 socket timeout and does not remove group
> 4.  follower2 then begin infinite LeaderElection at least 6 hours, leader and follower1 response group not found
> You can see find it in following screenshot.
> 1. follower2 report close pipeline
>  !screenshot-1.png! 
> 2. Scm close pipeline:
>  !screenshot-2.png! 
>  !screenshot-3.png! 
> 3. leader remove group
>  !screenshot-4.png! 
>    follower1 remove group
>  !screenshot-5.png! 
>  follower2 socket timeout
>  !screenshot-6.png! 
> 4. follower2 then begin infinite LeaderElection at least 6 hours
>  !screenshot-7.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)