You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Jackson Yao (Jira)" <ji...@apache.org> on 2022/01/07 12:37:00 UTC

[jira] [Commented] (HDDS-6162) limit OM request buffer size to avoid taking to much memeory

    [ https://issues.apache.org/jira/browse/HDDS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470575#comment-17470575 ] 

Jackson Yao commented on HDDS-6162:
-----------------------------------

cc [~ljain] [~bharat] [~msingh]

> limit OM request buffer size to avoid taking to much memeory 
> -------------------------------------------------------------
>
>                 Key: HDDS-6162
>                 URL: https://issues.apache.org/jira/browse/HDDS-6162
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Jackson Yao
>            Assignee: Jackson Yao
>            Priority: Major
>
> now , if OM HA is enabled , when a client request arrives at OM leader, the request will be written to ratis log and replicated to the other two followers. 
> how the request is handled by om is as follows:
> 1 statemachineUpdater(ratis) will apply each log to omStatemachine by calling statemachine#applyThransaction
> 2 in `applyTransaction` , a request-handing function will be wrapped into `runcommand` and be submitted to the single-thread threadpool.
> 3 `runcommand` will be put into a *unlimited blocking queue* of the thread pool , and the single thread in the pool will take the task from the queue and execute it one by one.
> 4 when executing the task, the request will be handled and put to the omDubbleBuffer`s currentBuffer, which is a *unlimited blocking queue.*
> 5 if the currentBuffer is not empty , omDubbleBuffer will swap currentBuffer and readybuffer.
> 6 an Asynchronous flush thread will put all the requests in the readybuffer to a rocksdb batch, and then commit the batch to the db.
>  
> so there may be a problem.  if there are a large number of requests, but the commit option is time-consuming, then more and more request will be put into the blocking queue of the thread pool , or the currentBuffer of the omDubbleBuffer, and this will consume memory very much. in our cluster , the two queue is very large and leading to a very long full GC( about five minutes), but reclaiming very little space after GC. what`s more, this will lead to the heartbeat timeout and reelection of ratis.  so this will make om not available.
> so the idea here is that , we need to limit the size of the blocking queue above, and make the size configurable. when the max size is hit, the client request should be blocked.
> by the way, raft is strongly sequential, so every raft request must be handled sequentially, even if they are independent. so maybe we could refactor the current implementation of omStatemachine, maybe like scm statemachine.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org