You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2023/08/16 10:38:00 UTC

[jira] [Updated] (HDDS-9151) Close EC Pipeline when container transitions to closing

     [ https://issues.apache.org/jira/browse/HDDS-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen O'Donnell updated HDDS-9151:
------------------------------------
    Description: 
In testing we have found an issues in the ECWritableContainerProvider.

For EC a pipeline is used for only one container, when the container gets closed, the pipeline also gets closed. At the moment, the only place in the code which closes the EC piplines which no longer have an open container is inside the ECWritableContainerProvider. It first gets the list of open piplines and enforces the pipeline limit, then for all open pipelines, it tries top find one the client can use.

If the client has had problems writing to the pipelines (eg it was given a container/pipeline and then the write failed as the container was closed on the DN), the pipelines get added to the exclude list. Then we can get into a situation where many pipelines need to be closed on the write path, slowing down block allocation. 

Ideally, when a container transitions to CLOSING in SCM, if the container is an EC container, we should also close the associated pipeline to avoid it counting toward the limit and to avoid needing to close it during the write (block allocation) path.

This could be achieved relatively simply inside the PipelineManagerImpl.removeContainersFromPipeline() method which is called as soon as the container transitions to CLOSING via ContainerStateManagerImpl.updateContainerState() when it executes the containerStateChangeActions. Wrapping the container close and pipeline close in a lock inside PipelineManagerImpl ensure we have a consistent "ec container close" flow and it should avoid the ECWritableContainerProvider needing to close the pipelines internally. However we can leave that code in place in ECWritableContainerProvider incase some pipelines slip through somehow.

  was:
In testing we have found an issues in the ECWritableContainerProvider.

For EC a pipeline is used for only one container, when the container gets closed, the pipeline also gets closed. However closing the container can take some time. First it is marked as CLOSING in SCM, then SCM sends commands to the DNs to close it, and finally the container gets CLOSED.

As we limit the number of pipelines in SCM, containers in a CLOSING state mean there are containers/pipelines which are effectively closed, but the pipeline are still counted toward the limit.

Ideally, when a container transitions to CLOSING in SCM, if the container is an EC container, we should also close the associated pipeline to avoid it counting toward the limit.

This could be achieved relatively simply inside the PipelineManagerImpl.removeContainersFromPipeline() method which is called as soon as the container transitions to CLOSING via ContainerStateManagerImpl.updateContainerState() when it executes the containerStateChangeActions.


> Close EC Pipeline when container transitions to closing
> -------------------------------------------------------
>
>                 Key: HDDS-9151
>                 URL: https://issues.apache.org/jira/browse/HDDS-9151
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> In testing we have found an issues in the ECWritableContainerProvider.
> For EC a pipeline is used for only one container, when the container gets closed, the pipeline also gets closed. At the moment, the only place in the code which closes the EC piplines which no longer have an open container is inside the ECWritableContainerProvider. It first gets the list of open piplines and enforces the pipeline limit, then for all open pipelines, it tries top find one the client can use.
> If the client has had problems writing to the pipelines (eg it was given a container/pipeline and then the write failed as the container was closed on the DN), the pipelines get added to the exclude list. Then we can get into a situation where many pipelines need to be closed on the write path, slowing down block allocation. 
> Ideally, when a container transitions to CLOSING in SCM, if the container is an EC container, we should also close the associated pipeline to avoid it counting toward the limit and to avoid needing to close it during the write (block allocation) path.
> This could be achieved relatively simply inside the PipelineManagerImpl.removeContainersFromPipeline() method which is called as soon as the container transitions to CLOSING via ContainerStateManagerImpl.updateContainerState() when it executes the containerStateChangeActions. Wrapping the container close and pipeline close in a lock inside PipelineManagerImpl ensure we have a consistent "ec container close" flow and it should avoid the ECWritableContainerProvider needing to close the pipelines internally. However we can leave that code in place in ECWritableContainerProvider incase some pipelines slip through somehow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org