You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2023/03/15 21:36:00 UTC
[jira] [Updated] (HDDS-8172) Duplicate replicateContainerCommand Being Sent by SCM

     [ https://issues.apache.org/jira/browse/HDDS-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen O'Donnell updated HDDS-8172:
------------------------------------
    Description: 
For an EC container which has 2 replicas for the same index, with one decommissioning and one in_maintenance, the decommission logic in ECUnderReplicationHandler can send a command for the replica, and then the maintenance logic can send another replication command for the same container to a different target. If they both succeed it will likely result in over replication.

To solve this, we probably need to adjust the pending ops between each stage of the processing, so as then the maintenance logic would be "fixed by pending" and avoid sending the second command.

  was:
Duplicate Replication Commands[replicateContainerCommand] are being sent by SCM for the same container

 
{code:java}
2023-03-15 04:30:01,642 INFO org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending command [replicateContainerCommand: containerId: 11001, replicaIndex: 5, sourceNodes: [56f83447-2cec-4137-8b82-15ee1bc200a9(host-6.host.root.hwx.site/172.27.xxx.xxx)]] for container ContainerInfo{id=#11001, state=CLOSED, pipelineID=PipelineID=1fab690a-2176-4fb0-a0e8-4243f57af4fd, stateEnterTime=2023-03-15T04:09:54.255Z, owner=om2} to dfcc61cb-deed-453e-8a8d-c34bb73a4ada(host-1.host.root.hwx.site/172.27.xx.xxx)

2023-03-15 04:30:01,642 INFO org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending command [replicateContainerCommand: containerId: 11001, replicaIndex: 5, sourceNodes: [56f83447-2cec-4137-8b82-15ee1bc200a9(hostname-6.hostname.root.hwx.site/172.27.xxx.xx)]] for container ContainerInfo{id=#11001, state=CLOSED, pipelineID=PipelineID=1fab690a-2176-4fb0-a0e8-4243f57af4fd, stateEnterTime=2023-03-15T04:09:54.255Z, owner=om2} to 1b834c42-7a2e-4154-93d4-b8391893d000(hostname-9.hostname.root.hwx.site/172.27.xxx.xx) {code}
 

 


> Duplicate replicateContainerCommand Being Sent by SCM
> -----------------------------------------------------
>
>                 Key: HDDS-8172
>                 URL: https://issues.apache.org/jira/browse/HDDS-8172
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Arun Sarin
>            Priority: Major
>
> For an EC container which has 2 replicas for the same index, with one decommissioning and one in_maintenance, the decommission logic in ECUnderReplicationHandler can send a command for the replica, and then the maintenance logic can send another replication command for the same container to a different target. If they both succeed it will likely result in over replication.
> To solve this, we probably need to adjust the pending ops between each stage of the processing, so as then the maintenance logic would be "fixed by pending" and avoid sending the second command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org