You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/12/29 11:32:32 UTC

[GitHub] [ozone] GlenGeng opened a new pull request #1743: HDDS-4630: Solve deadlock triggered by PipelineActionHandler.

GlenGeng opened a new pull request #1743:
URL: https://github.com/apache/ozone/pull/1743


   ## What changes were proposed in this pull request?
   
   This dead lock is found when trying to replace the MockRatisServer with single server SCMRatisServer in MiniOzoneCluster.
   It can be reproduced by case TestContainerStateMachineFlushDelay#testContainerStateMachineFailures, when replacing the mock ratis server with the real one.
   
   **The root cause is**
   when close a pipeline, it will first close the open containers of this pipeline, then remove the pipeline. The contention here is:
   
   1. ContainerManager has committed the log entry that containing updateContainerState, and the StateMachineUpdater is applying this method, waiting for the lock of PipelineManagerV2Impl. Since when a container transitions from open to un-open, it needs to call PipelineManager#removeContainerFromPipeline, thus need the lock of PipelineManagerV2Impl.
   
   2. In PipelineActionHander, it has acquired the lock of PipelineManagerV2Impl during the call of PipelineManagerV2Impl#removePipeline(), and it is waiting for StateManager#removePipeline() to be committed by raft and applied by StateMachineUpdater.
   
   thus, ContainerManager occupy StateMachineUpdater, and waiting for the lock of PipelineManager, PipelineActionHander acquire the lock of PipelineManager, and waiting for StateMachineUpdater to apply its raft client request.
   
   **The solution is**
   We have PipelineManager and PipelineStateManager, ContainerManager and ContainerStateManager, each has its own rw lock.
   Let's discuss about PipelineManager and PipelineStateManager first.
    
   PipelineStateManager contains the in-memory state and the rocksdb. It use a rw lock to ensure the consistency of the in-memory state and rocksdb. This is done in this PR: https://github.com/apache/ozone/pull/1676
   
   The write request needs acquire the write lock before do modification, and the read request needs acquire the read lock before read. All the write request are from StateMachineUpdater, and the read requests are mainly from foreground request, which means all the modifications are done from ratis.
   
   For the non-HA code, the rw lock in PipelineManager is the only protection for thread-safety, there is no lock in PipelineStateManager. But for HA code, we have to rely on the rw lock in PipelineStateManager to ensure the thread-safety.
    
   Since currently most of the lock operations in PipelineManager and PipelineStateManager are duplicated, we can relax the lock in PipelineManager, just use it to ensure that there is at most one on-going ratis operation. Previous logic is acquiring the write lock of PipelineManager and doing raft client request, ratis client requests are serialized, we just follow this logic.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4630
   
   ## How was this patch tested?
   
   CI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #1743: HDDS-4630: Solve deadlock triggered by PipelineActionHandler.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1743:
URL: https://github.com/apache/ozone/pull/1743#issuecomment-755941203


   +1. 
   
   Hi  @nandakumar131 ,  I'm going to merge this PR first to enable SCM HA in MiniOzoneCluster.   We can revisit the lock usages in PipelineManager and ContainerManager later to find out the posibility of lock free. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #1743: HDDS-4630: Solve deadlock triggered by PipelineActionHandler.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #1743:
URL: https://github.com/apache/ozone/pull/1743#issuecomment-752324532


   cc @timmylicheng 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #1743: HDDS-4630: Solve deadlock triggered by PipelineActionHandler.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #1743:
URL: https://github.com/apache/ozone/pull/1743#issuecomment-752321387


   cc @nandakumar131 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi merged pull request #1743: HDDS-4630: Solve deadlock triggered by PipelineActionHandler.

Posted by GitBox <gi...@apache.org>.
ChenSammi merged pull request #1743:
URL: https://github.com/apache/ozone/pull/1743


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org