You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Tsz-wo Sze (Jira)" <ji...@apache.org> on 2019/10/21 16:32:00 UTC

[jira] [Commented] (RATIS-726) TimeoutScheduler holds on to the raftClientRequest till it times out even though request succeeds

    [ https://issues.apache.org/jira/browse/RATIS-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956244#comment-16956244 ] 

Tsz-wo Sze commented on RATIS-726:
----------------------------------

Good catch on the bug.  TimeoutScheduler is for scheduling timeout retries and has no knowledge about the requests.  We should fix the code using it.  Thanks a lot!

> TimeoutScheduler holds on to the raftClientRequest till it times out even though request succeeds
> -------------------------------------------------------------------------------------------------
>
>                 Key: RATIS-726
>                 URL: https://issues.apache.org/jira/browse/RATIS-726
>             Project: Ratis
>          Issue Type: Bug
>          Components: client
>            Reporter: Shashikant Banerjee
>            Assignee: Tsz-wo Sze
>            Priority: Major
>
> While running freon with 1 Node ratis, it was observed that the TimeoutScheduler holds on to the raftClientObject atleast for 3s(default for requestTimeoutDuration) even though the request is processed successfully and acknowledged back. This ends up creating a memory pressure causing ozone client to go OOM .
>  Heapdump analysis of HDDS-2331 , it seems the timeout schduler holding onto total of 176 requests, (88 of writeChunk containing actual data and 88 putBlock requests) although data write is happening sequentially key by key in ozone.
> Thanks [~adoroszlai] for helping out discovering this.
> cc ~ [~ljain] [~msingh] [~szetszwo] [~jnpandey]
> Similar fix may be required in GrpCLogAppender as well it uses the same TimeoutScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)