You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ratis.apache.org by "Rajeshbabu Chintaguntla (Jira)" <ji...@apache.org> on 2019/09/03 11:36:01 UTC

[jira] [Commented] (RATIS-556) Detect node failures and close the log to prevent additional writes

    [ https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921350#comment-16921350 ] 

Rajeshbabu Chintaguntla commented on RATIS-556:
-----------------------------------------------

bq. We should get a big logger WARN message saying that we're closing the log.
Done.
bq.Should we be checking anything in this Reply?
Checking for any failures now and retry on exception.
bq.This isn't quite right. (new PeerGroups[1]).length is always greater than 0, but peerGroupsToRemove[0] may be null. Make this a List and just append (potentially) multiple PeerGroups to it?
Correct [~elserj] fixed it.
bq.Would it be possible to modify that test or add a new test which makes sure that the contents of each data structure we maintain are kept in sync? I am talking about map, peers, peerLogs,heartbeatInfo and avail? However you think easiest to test it would be good. We wouldn't want these data structures to drift and become out of sync (as they would just leak memory).
Checking for peers are in sync in this data structures before closing log and after closing the log.

> Detect node failures and close the log to prevent additional writes
> -------------------------------------------------------------------
>
>                 Key: RATIS-556
>                 URL: https://issues.apache.org/jira/browse/RATIS-556
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Major
>         Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch, RATIS-556_v2.patch, RATIS-556_v3.patch, RATIS-556_v4.patch
>
>
> Currently there is no way to detect the node failures at master log servers and add new nodes to the group serving the log. We need to analyze how Ozone is working in this case.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)