You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/05/23 04:46:21 UTC

[GitHub] [ozone] JacksonYao287 opened a new pull request, #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

JacksonYao287 opened a new pull request, #3445:
URL: https://github.com/apache/ozone/pull/3445

   ## What changes were proposed in this pull request?
   
   add a background  service to schedule thread to purge the expired replication ops.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6771
   
   
   ## How was this patch tested?
   
   UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on PR #3445:
URL: https://github.com/apache/ozone/pull/3445#issuecomment-1135343178

   i have force-updated this patch. in the new patch:
   1 add a common background scm service final class. after investigating , i think no need to add an abstract class, a final class is enough. we can create different background scm service just by changing the parameters in the builders. the task schedule logic can be shared naturally. for backgroundPipelineScrubber, we can just change the periodical task to `pipelineManager.scrubPipelines()`. for ReplicationManager, we can add a internal background scm service and set the periodical task to `processALL()`
   
   2 I also am not sure if we should be adding config in the way i have done here, or using the "Type Safe" approach. so i keep the approach like this for now, and can be changed later after we reach an agreement on this top. seems not complex.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3445:
URL: https://github.com/apache/ozone/pull/3445#discussion_r880344115


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/BackgroundSCMService.java:
##########
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+package org.apache.hadoop.hdds.scm.ha;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.ratis.util.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.time.Clock;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * A common implementation for background SCMService.
+ * */
+public final class BackgroundSCMService implements SCMService {
+  private final Logger log;
+  private final AtomicBoolean running = new AtomicBoolean(false);
+  private Thread backgroundThread;
+  private final Lock serviceLock = new ReentrantLock();
+  private final SCMContext scmContext;
+  private ServiceStatus serviceStatus = ServiceStatus.PAUSING;
+  private final long intervalInMillis;
+  private final long waitTimeInMillis;
+  private long lastTimeToBeReadyInMillis = 0;
+  private final Clock clock;
+  private String serviceName;
+  private Runnable periodicalTask;
+  private volatile boolean runImmediately = false;
+
+  private BackgroundSCMService(
+      final Clock clock, final SCMContext scmContext,
+      final String serviceName, final long intervalInMillis,
+      final long waitTimeInMillis, final Runnable task) {
+    this.scmContext = scmContext;
+    this.clock = clock;
+    this.periodicalTask = task;
+    this.serviceName = serviceName;
+    this.log = LoggerFactory.getLogger(serviceName);
+    this.intervalInMillis = intervalInMillis;
+    this.waitTimeInMillis = waitTimeInMillis;
+    start();
+  }
+
+  @Override
+  public void start() {
+    if (!running.compareAndSet(false, true)) {
+      log.info("{} Service is already running, skip start.", getServiceName());
+      return;
+    }
+    log.info("Starting {} Service.", getServiceName());
+
+    backgroundThread = new Thread(this::run);
+    backgroundThread.setName(serviceName + "Thread");
+    backgroundThread.setDaemon(true);
+    backgroundThread.start();
+  }
+
+  @Override
+  public void notifyStatusChanged() {
+    serviceLock.lock();
+    try {
+      if (scmContext.isLeaderReady() && !scmContext.isInSafeMode()) {
+        if (serviceStatus != ServiceStatus.RUNNING) {
+          log.info("Service {} transitions to RUNNING.", getServiceName());
+          serviceStatus = ServiceStatus.RUNNING;
+          lastTimeToBeReadyInMillis = clock.millis();
+        }
+      } else {
+        if (serviceStatus != ServiceStatus.PAUSING) {
+          log.info("Service {} transitions to PAUSING.", getServiceName());
+          serviceStatus = ServiceStatus.PAUSING;
+        }
+      }
+    } finally {
+      serviceLock.unlock();
+    }
+  }
+
+  private void run() {
+    while (running.get()) {
+      try {
+        if (shouldRun()) {
+          periodicalTask.run();
+        }
+        synchronized (this) {
+          if (!runImmediately) {
+            wait(intervalInMillis);
+          }
+          runImmediately = false;
+        }
+      } catch (InterruptedException e) {

Review Comment:
   Should we catch and perhaps log any exceptions here? I've seen cases where these sort of background threads get an error and then just exit with no logging at all. I think we should at least log the error, and consider letting the job retry next time.
   
   However the way the code is structured now, if we get an error in `periodicalTask.run()` and catch it to let it retry, it will not wait and will retry over and over immediately. We probably don't want that either!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a diff in pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on code in PR #3445:
URL: https://github.com/apache/ozone/pull/3445#discussion_r879322611


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ExpiredContainerReplicaOpScrubber.java:
##########
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+package org.apache.hadoop.hdds.scm.container.replication;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.hdds.HddsConfigKeys;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.ha.SCMContext;
+import org.apache.hadoop.hdds.scm.ha.SCMService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.time.Clock;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * Background service to clean up expired container replica operations.
+ */
+public class ExpiredContainerReplicaOpScrubber implements SCMService {

Review Comment:
   yea , i noticed this too. i wanted to create a separate jira to do this work.  but we can also do it in this jira, will try to do it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3445:
URL: https://github.com/apache/ozone/pull/3445#discussion_r879329027


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ExpiredContainerReplicaOpScrubber.java:
##########
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+package org.apache.hadoop.hdds.scm.container.replication;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.hdds.HddsConfigKeys;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.ha.SCMContext;
+import org.apache.hadoop.hdds.scm.ha.SCMService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.time.Clock;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * Background service to clean up expired container replica operations.
+ */
+public class ExpiredContainerReplicaOpScrubber implements SCMService {

Review Comment:
   Lets try to create the abstract class in this Jira and let this service use it. Also try to design it so that BackGroundPipelineScrubber and hopefully ReplicaitonManager could use it, but don't change them in this Jira. We can have a separate jira to refactor them to use it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3445:
URL: https://github.com/apache/ozone/pull/3445#discussion_r880355277


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java:
##########
@@ -625,6 +626,36 @@ private void initializeSystemManagers(OzoneConfiguration conf,
     }
 
     containerReplicaPendingOps = new ContainerReplicaPendingOps(conf, clock);
+
+    long intervalInMillis = conf.getTimeDuration(
+        ScmConfigKeys
+            .OZONE_SCM_EXPIRED_CONTAINER_REPLICA_OP_SCRUB_INTERVAL,
+        ScmConfigKeys
+            .OZONE_SCM_EXPIRED_CONTAINER_REPLICA_OP_SCRUB_INTERVAL_DEFAULT,
+        TimeUnit.MILLISECONDS);
+
+    long expiryMilliSeconds = conf.getTimeDuration(

Review Comment:
   Just thinking that we may have different background threads if we reuse this class - maybe these variables should have a better name, eg containerReplicaOpExipryMs, containerReplicaOpScrubberIntervalMs.
   
   waitTimeInMillis can be re-used for several services, `backgroundServiceSafemodeWaitMs`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a diff in pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on code in PR #3445:
URL: https://github.com/apache/ozone/pull/3445#discussion_r880518331


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/BackgroundSCMService.java:
##########
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+package org.apache.hadoop.hdds.scm.ha;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.ratis.util.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.time.Clock;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * A common implementation for background SCMService.
+ * */
+public final class BackgroundSCMService implements SCMService {
+  private final Logger log;
+  private final AtomicBoolean running = new AtomicBoolean(false);
+  private Thread backgroundThread;
+  private final Lock serviceLock = new ReentrantLock();
+  private final SCMContext scmContext;
+  private ServiceStatus serviceStatus = ServiceStatus.PAUSING;
+  private final long intervalInMillis;
+  private final long waitTimeInMillis;
+  private long lastTimeToBeReadyInMillis = 0;
+  private final Clock clock;
+  private String serviceName;
+  private Runnable periodicalTask;
+  private volatile boolean runImmediately = false;
+
+  private BackgroundSCMService(
+      final Clock clock, final SCMContext scmContext,
+      final String serviceName, final long intervalInMillis,
+      final long waitTimeInMillis, final Runnable task) {
+    this.scmContext = scmContext;
+    this.clock = clock;
+    this.periodicalTask = task;
+    this.serviceName = serviceName;
+    this.log = LoggerFactory.getLogger(serviceName);
+    this.intervalInMillis = intervalInMillis;
+    this.waitTimeInMillis = waitTimeInMillis;
+    start();
+  }
+
+  @Override
+  public void start() {
+    if (!running.compareAndSet(false, true)) {
+      log.info("{} Service is already running, skip start.", getServiceName());
+      return;
+    }
+    log.info("Starting {} Service.", getServiceName());
+
+    backgroundThread = new Thread(this::run);
+    backgroundThread.setName(serviceName + "Thread");
+    backgroundThread.setDaemon(true);
+    backgroundThread.start();
+  }
+
+  @Override
+  public void notifyStatusChanged() {
+    serviceLock.lock();
+    try {
+      if (scmContext.isLeaderReady() && !scmContext.isInSafeMode()) {
+        if (serviceStatus != ServiceStatus.RUNNING) {
+          log.info("Service {} transitions to RUNNING.", getServiceName());
+          serviceStatus = ServiceStatus.RUNNING;
+          lastTimeToBeReadyInMillis = clock.millis();
+        }
+      } else {
+        if (serviceStatus != ServiceStatus.PAUSING) {
+          log.info("Service {} transitions to PAUSING.", getServiceName());
+          serviceStatus = ServiceStatus.PAUSING;
+        }
+      }
+    } finally {
+      serviceLock.unlock();
+    }
+  }
+
+  private void run() {
+    while (running.get()) {
+      try {
+        if (shouldRun()) {
+          periodicalTask.run();
+        }
+        synchronized (this) {
+          if (!runImmediately) {
+            wait(intervalInMillis);
+          }
+          runImmediately = false;
+        }
+      } catch (InterruptedException e) {

Review Comment:
   `Runnable` can't throw a checked exception because it is not declared to do so. we can't throw checked exceptions without declaring them. if there is any exception in `run`, it should be caught and handled(Eg. log this error) inside itself , but not be thrown out.
   so , i suggest we keep the code like this and implement error or exception handling logic inside `periodicalTask.run()` itself



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java:
##########
@@ -625,6 +626,36 @@ private void initializeSystemManagers(OzoneConfiguration conf,
     }
 
     containerReplicaPendingOps = new ContainerReplicaPendingOps(conf, clock);
+
+    long intervalInMillis = conf.getTimeDuration(
+        ScmConfigKeys
+            .OZONE_SCM_EXPIRED_CONTAINER_REPLICA_OP_SCRUB_INTERVAL,
+        ScmConfigKeys
+            .OZONE_SCM_EXPIRED_CONTAINER_REPLICA_OP_SCRUB_INTERVAL_DEFAULT,
+        TimeUnit.MILLISECONDS);
+
+    long expiryMilliSeconds = conf.getTimeDuration(

Review Comment:
   looks good, will change the variable name 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a diff in pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on code in PR #3445:
URL: https://github.com/apache/ozone/pull/3445#discussion_r879319891


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaPendingOps.java:
##########
@@ -48,10 +50,19 @@ public class ContainerReplicaPendingOps {
       pendingOps = new ConcurrentHashMap<>();
   private final Striped<ReadWriteLock> stripedLock = Striped.readWriteLock(64);
 
-  public ContainerReplicaPendingOps(final ConfigurationSource conf,
-      Clock clock) {
+  private final ExpiredContainerReplicaOpScrubber
+      expiredContainerReplicaOpScrubber;
+
+  public ContainerReplicaPendingOps(
+      final ConfigurationSource conf,
+      final Clock clock,
+      final SCMServiceManager serviceManager,
+      final SCMContext scmContext) {

Review Comment:
   thanks for the suggestion, looks good.  will do it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3445:
URL: https://github.com/apache/ozone/pull/3445#discussion_r879300197


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaPendingOps.java:
##########
@@ -48,10 +50,19 @@ public class ContainerReplicaPendingOps {
       pendingOps = new ConcurrentHashMap<>();
   private final Striped<ReadWriteLock> stripedLock = Striped.readWriteLock(64);
 
-  public ContainerReplicaPendingOps(final ConfigurationSource conf,
-      Clock clock) {
+  private final ExpiredContainerReplicaOpScrubber
+      expiredContainerReplicaOpScrubber;
+
+  public ContainerReplicaPendingOps(
+      final ConfigurationSource conf,
+      final Clock clock,
+      final SCMServiceManager serviceManager,
+      final SCMContext scmContext) {

Review Comment:
   If we are going to have a separate "scrubber" class, then I don't think ContainerReplicaPendingOps needs a dependency on SCMServiceManager or SCMContext. It also doesn't need to keep a reference to the Scrubber thread.
   
   The scrubber thread will be the "service" that needs registered with ServiceManager, and it will need a reference to the ContainerReplicaPendingOps instance. Even the scrubber does not need a dependency on ServiceManager - we can register it like:
   
   ```
   Scrubber scrubber = new Scrubber(containerReplicaPendingOps);
   serviceManager.register(scrubber);
   ```
   
   That reduces the dependencies all the class have.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ExpiredContainerReplicaOpScrubber.java:
##########
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+package org.apache.hadoop.hdds.scm.container.replication;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.hdds.HddsConfigKeys;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.ha.SCMContext;
+import org.apache.hadoop.hdds.scm.ha.SCMService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.time.Clock;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * Background service to clean up expired container replica operations.
+ */
+public class ExpiredContainerReplicaOpScrubber implements SCMService {

Review Comment:
   I have seen a few of these "service" classes, and there is a lot of repeated code in them, some of it tricky concurrent code with `wait()` and `notify()` etc. I think this is an opportunity to create an abstract class that that implements the service interfaces and provides a common set of code that other services can use going forward. If we create that, perhaps we can make the new replicationManager use it, and there was another service added recently for EC - BackGroundPipelineScrubber - its a very simple class like this one. I'm sure we can create a common "service" class these classes can extend.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on PR #3445:
URL: https://github.com/apache/ozone/pull/3445#issuecomment-1134530562

   thanks @sodonnel for the review. i will update this PR later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3445:
URL: https://github.com/apache/ozone/pull/3445#discussion_r880352618


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/BackgroundSCMService.java:
##########
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+package org.apache.hadoop.hdds.scm.ha;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.ratis.util.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.time.Clock;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * A common implementation for background SCMService.
+ * */
+public final class BackgroundSCMService implements SCMService {
+  private final Logger log;
+  private final AtomicBoolean running = new AtomicBoolean(false);
+  private Thread backgroundThread;
+  private final Lock serviceLock = new ReentrantLock();
+  private final SCMContext scmContext;
+  private ServiceStatus serviceStatus = ServiceStatus.PAUSING;
+  private final long intervalInMillis;
+  private final long waitTimeInMillis;
+  private long lastTimeToBeReadyInMillis = 0;
+  private final Clock clock;
+  private String serviceName;
+  private Runnable periodicalTask;
+  private volatile boolean runImmediately = false;
+
+  private BackgroundSCMService(
+      final Clock clock, final SCMContext scmContext,
+      final String serviceName, final long intervalInMillis,
+      final long waitTimeInMillis, final Runnable task) {
+    this.scmContext = scmContext;
+    this.clock = clock;
+    this.periodicalTask = task;
+    this.serviceName = serviceName;
+    this.log = LoggerFactory.getLogger(serviceName);
+    this.intervalInMillis = intervalInMillis;
+    this.waitTimeInMillis = waitTimeInMillis;
+    start();
+  }
+
+  @Override
+  public void start() {
+    if (!running.compareAndSet(false, true)) {
+      log.info("{} Service is already running, skip start.", getServiceName());
+      return;
+    }
+    log.info("Starting {} Service.", getServiceName());
+
+    backgroundThread = new Thread(this::run);
+    backgroundThread.setName(serviceName + "Thread");
+    backgroundThread.setDaemon(true);
+    backgroundThread.start();
+  }
+
+  @Override
+  public void notifyStatusChanged() {
+    serviceLock.lock();
+    try {
+      if (scmContext.isLeaderReady() && !scmContext.isInSafeMode()) {
+        if (serviceStatus != ServiceStatus.RUNNING) {
+          log.info("Service {} transitions to RUNNING.", getServiceName());
+          serviceStatus = ServiceStatus.RUNNING;
+          lastTimeToBeReadyInMillis = clock.millis();
+        }
+      } else {
+        if (serviceStatus != ServiceStatus.PAUSING) {
+          log.info("Service {} transitions to PAUSING.", getServiceName());
+          serviceStatus = ServiceStatus.PAUSING;
+        }
+      }
+    } finally {
+      serviceLock.unlock();
+    }
+  }
+
+  private void run() {
+    while (running.get()) {
+      try {
+        if (shouldRun()) {
+          periodicalTask.run();
+        }
+        synchronized (this) {
+          if (!runImmediately) {
+            wait(intervalInMillis);
+          }
+          runImmediately = false;
+        }
+      } catch (InterruptedException e) {

Review Comment:
   Also I am not sure if a runnable is allowed to throw checked exceptions? Can it only throw RuntimeExceptions? Maybe its enough to catch and log RuntimeException and then re-throw? I am not sure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on PR #3445:
URL: https://github.com/apache/ozone/pull/3445#issuecomment-1136455457

   Nice work @JacksonYao287 and @sodonnel 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
sodonnel merged PR #3445:
URL: https://github.com/apache/ozone/pull/3445


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #3445: HDDS-6771. EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on PR #3445:
URL: https://github.com/apache/ozone/pull/3445#issuecomment-1136546001

   thanks @sodonnel and @umamaheswararao for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org