You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Lokesh Jain (Jira)" <ji...@apache.org> on 2019/11/21 16:47:00 UTC

[jira] [Comment Edited] (RATIS-458) GrpcLogAppender#shouldWait should wait on pending log entries to follower

    [ https://issues.apache.org/jira/browse/RATIS-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979413#comment-16979413 ] 

Lokesh Jain edited comment on RATIS-458 at 11/21/19 4:46 PM:
-------------------------------------------------------------

| If we use (nextIndex - matchIndex), do we expect the leader won't resend the timeout requests?

[~szetszwo] The leader currently does not resend the timed out requests. These requests are retried when an exception is received in streamObserver onError call or when the follower sends an inconsistent reply message. In both these cases we reset the nextIndex so that the requests will be retried.

| If for some reason the matchIndex is not updated, shouldWait() may return true forever.

If a reply from follower is lost, it should lead to onError call in the follower? OnError would currently call the onCompleted function for the reply streamObserver at leader. OnCompleted resets the client in the leader so that requests will be retried.

| what is problem we are observing in the current approach

We are not observing any problems in the cluster with the current approach. But I thought it would be good to wait on the actual number of pending log entries. Because pendingRequests size can shrink and it also includes heartbeats.


was (Author: ljain):
| If we use (nextIndex - matchIndex), do we expect the leader won't resend the timeout requests?

[~szetszwo] The leader currently does not resend the timed out requests. These requests are retried when an exception is received in streamObserver onError call or when the follower sends an inconsistent reply message. In both these cases we reset the nextIndex so that the requests will be retried.

| If for some reason the matchIndex is not updated, shouldWait() may return true forever.

If a reply from follower is lost, it should lead to onError call in the follower? OnError would currently call the onCompleted function for the reply streamObserver at leader.

| what is problem we are observing in the current approach

We are not observing any problems in the cluster with the current approach. But I thought it would be good to wait on the actual number of pending log entries. Because pendingRequests size can shrink and it also includes heartbeats.

> GrpcLogAppender#shouldWait should wait on pending log entries to follower
> -------------------------------------------------------------------------
>
>                 Key: RATIS-458
>                 URL: https://issues.apache.org/jira/browse/RATIS-458
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Lokesh Jain
>            Assignee: Lokesh Jain
>            Priority: Blocker
>              Labels: ozone
>         Attachments: RATIS-458.001.patch, RATIS-458.002.patch, RATIS-458.003.patch, RATIS-458.004.patch, RATIS-458.005.patch
>
>
> In GrpcLogAppender when an append entry times out we remove the entry from the pendingRequests. This decreases the size of pendingRequests which affects the logic in GrpcLogAppender#shouldWait. Further we also consider heartbeats in shouldWait because heartbeats are tracked in pendingRequests. It should actually wait on the number of log entries which are appended to follower but have not yet been processed by it.
> GrpcConfigKeys.Server.leaderOutstandingAppendsMax should also be a fraction of RaftServerConfigKeys.Log.queueSize. This brings flow control for leader's append entries to follower because then number of outstanding append entries in leader would be limited by maximum number of operations in raft log worker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)