You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/03/06 20:41:33 UTC

[jira] [Updated] (KUDU-1391) 2 of 3 replica alive but failed to elect leader

     [ https://issues.apache.org/jira/browse/KUDU-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated KUDU-1391:
------------------------------
    Component/s: consensus

> 2 of 3 replica alive but failed to elect leader
> -----------------------------------------------
>
>                 Key: KUDU-1391
>                 URL: https://issues.apache.org/jira/browse/KUDU-1391
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>            Reporter: Binglin Chang
>         Attachments: 6a32cfa0353e4175809c2aa67e16ac9e.log.st172, 6a32cfa0353e4175809c2aa67e16ac9e.log.st212, 6a32cfa0353e4175809c2aa67e16ac9e.log.st212.before, 6a32cfa0353e4175809c2aa67e16ac9e.log.st216, remote-bootstrap-tool.patch
>
>
> Last weekend many TS have a lot too many open files error(haven't upgrade to , when using our internal deploy tool to restart cluster (stop all ts, then start all ts), the control machine have some issue which seems to block or write to ssh terminal(maybe usb driver issue, not related to this bug), so only half (about 30) of the TS is shutdown, then after maybe 10 minutes, I switch to another control host and perform the whole restart. 
> Then I see writes are blocked, because 1 tablet is in no leader state, from web-ui, 2 of  3 replicas is in follower state, 1 TABLET_DATA_TOMBSTONED, but all election failed, will attach the log of the 2 followers. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)