You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "zhangsong (JIRA)" <ji...@apache.org> on 2017/01/16 07:14:26 UTC
[jira] [Commented] (KUDU-1579) into "safe mode" when large number
of node crash
[ https://issues.apache.org/jira/browse/KUDU-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823558#comment-15823558 ]
zhangsong commented on KUDU-1579:
---------------------------------
After experiencing several node failure cases(using kudu-tserver revision b906affcdee3ec814c9e96d35fea715fdbb4c330-dirty), i found these two fact.
1 when multiple kudu-tserver nodes crash at same time(not exact at same time), (let say 5 kudu nodes), there willl be failed tablet , reasons of the failed tablets should be thoses described in issue kudu-1449. Also from kudu-master ui i can see a lot of addServer/removeServer task hang there and there is no sign that they will recover automatically.
2 when facing multiply nodes crash, stop kudu-master until whole cluster is stable(no more node crash), restart kudu-master . After recovered all crashed kudu-tserver node , no failed tablet found.
So for my case, i seems kudu-master should freeze for sometime when facing multiple node crashed at same time (eg.within some period of time) freeze here , means it stop servicing addServer/RemoveServer rpc .
Just some thoughts today , may complete this later.
> into "safe mode" when large number of node crash
> --------------------------------------------------
>
> Key: KUDU-1579
> URL: https://issues.apache.org/jira/browse/KUDU-1579
> Project: Kudu
> Issue Type: New Feature
> Reporter: zhangsong
>
> Currently, replication will happen when met node crash .
> However when met large number of node crash , it will lead to replicate storm
> which will cause mess and data loss.
> replication should be prudent and the cluster should be into a "safe mode" in aboved node crash case.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)