You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2022/02/09 17:03:00 UTC

[jira] [Updated] (HDDS-6292) Ensure immutable ContainerReplica set is returned from ContainerStateManagerImpl

     [ https://issues.apache.org/jira/browse/HDDS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen O'Donnell updated HDDS-6292:
------------------------------------
    Description: 
Inside ContainerStateMap, the replicas for a container are stored in a Set backed by a ConcurrentHashMap.

When you ask for the current replicas of a container, this method is used:
{code:java}
public Set<ContainerReplica> getContainerReplicas(
      final ContainerID containerID) {
    Preconditions.checkNotNull(containerID);
    final Set<ContainerReplica> replicas = replicaMap.get(containerID);
    return replicas == null ? null : Collections.unmodifiableSet(replicas);
} {code}
Note that it pulls out the Set, wraps it as unmodifiable and returns it.

There is a problem here, in that if the Set is updated by ICR / FCR at the same time as another part of the code has taken a reference to it, the other part of the code can make incorrect decisions. Eg:

 
{code:java}
Set<> replicas = getContainerReplicas(...)
replicaCount = replicas.size()
// continue to do something based on the size{code}
ReplicationManger has run into a race condition like this. We also use the Replicas to form pipelines for closed containers, so I worry there could be some strange issues if the set if mutated during the pipeline creation.

I see two possible solutions here. `GetContainerReplicas` should create a copy of the Set and return that, so the copy the other part of the code gets is its own copy and nothing can change it.

Or, we make the Set immutable, so that each new replica details are received, we create the new copy of the set and store that. Then any other parts of the code can get a reference to it, and know it will never change.

Mutations to the replicas for a closed container will only happen with FCR, which is relatively rare.

However we may ask for read pipelines very frequently, so it would be cheaper overall to use option 2.

It we go with option 2, I think we can move from a concurrentHashMap to a plain hashMap too, which may make the memory footprint slightly smaller.

Note access to the replicas is via ContainerStateManagerImpl, which already has a course RW lock protecting access to the container manager. Quite possibly FCR reporting could be improved by a finer grained or striped lock.

This problem was reported in HDDS-5643.

  was:
Inside ContainerStateMap, the replicas for a container are stored in a Set backed by a ConcurrentHashMap.

When you ask for the current replicas of a container, this method is used:
{code:java}
public Set<ContainerReplica> getContainerReplicas(
      final ContainerID containerID) {
    Preconditions.checkNotNull(containerID);
    final Set<ContainerReplica> replicas = replicaMap.get(containerID);
    return replicas == null ? null : Collections.unmodifiableSet(replicas);
} {code}
Note that it pulls out the Set, wraps it as unmodifiable and returns it.

There is a problem here, in that if the Set is updated by ICR / FCR at the same time as another part of the code has taken a reference to it, the other part of the code can make incorrect decisions. Eg:

 
{code:java}
Set<> replicas = getContainerReplicas(...)
replicaCount = replicas.size()
// continue to do something based on the size{code}
ReplicationManger has run into a race condition like this. We also use the Replicas to form pipelines for closed containers, so I worry there could be some strange issues if the set if mutated during the pipeline creation.

I see two possible solutions here. `GetContainerReplicas` should create a copy of the Set and return that, so the copy the other part of the code gets is its own copy and nothing can change it.

Or, we make the Set immutable, so that each new replica details are received, we create the new copy of the set and store that. Then any other parts of the code can get a reference to it, and know it will never change.

Mutations to the replicas for a closed container will only happen with FCR, which is relatively rare.

However we may ask for read pipelines very frequently, so it would be cheaper overall to use option 2.

It we go with option 2, I think we can move from a concurrentHashMap to a plain hashMap too, which may make the memory footprint slightly smaller.

Note access to the replicas is via ContainerStateManagerImpl, which already has a course RW lock protecting access to the container manager. Quite possibly FCR reporting could be improved by a finer grained or striped lock.


> Ensure immutable ContainerReplica set is returned from ContainerStateManagerImpl
> --------------------------------------------------------------------------------
>
>                 Key: HDDS-6292
>                 URL: https://issues.apache.org/jira/browse/HDDS-6292
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> Inside ContainerStateMap, the replicas for a container are stored in a Set backed by a ConcurrentHashMap.
> When you ask for the current replicas of a container, this method is used:
> {code:java}
> public Set<ContainerReplica> getContainerReplicas(
>       final ContainerID containerID) {
>     Preconditions.checkNotNull(containerID);
>     final Set<ContainerReplica> replicas = replicaMap.get(containerID);
>     return replicas == null ? null : Collections.unmodifiableSet(replicas);
> } {code}
> Note that it pulls out the Set, wraps it as unmodifiable and returns it.
> There is a problem here, in that if the Set is updated by ICR / FCR at the same time as another part of the code has taken a reference to it, the other part of the code can make incorrect decisions. Eg:
>  
> {code:java}
> Set<> replicas = getContainerReplicas(...)
> replicaCount = replicas.size()
> // continue to do something based on the size{code}
> ReplicationManger has run into a race condition like this. We also use the Replicas to form pipelines for closed containers, so I worry there could be some strange issues if the set if mutated during the pipeline creation.
> I see two possible solutions here. `GetContainerReplicas` should create a copy of the Set and return that, so the copy the other part of the code gets is its own copy and nothing can change it.
> Or, we make the Set immutable, so that each new replica details are received, we create the new copy of the set and store that. Then any other parts of the code can get a reference to it, and know it will never change.
> Mutations to the replicas for a closed container will only happen with FCR, which is relatively rare.
> However we may ask for read pipelines very frequently, so it would be cheaper overall to use option 2.
> It we go with option 2, I think we can move from a concurrentHashMap to a plain hashMap too, which may make the memory footprint slightly smaller.
> Note access to the replicas is via ContainerStateManagerImpl, which already has a course RW lock protecting access to the container manager. Quite possibly FCR reporting could be improved by a finer grained or striped lock.
> This problem was reported in HDDS-5643.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org