You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by "qqu0127 (via GitHub)" <gi...@apache.org> on 2023/01/23 17:46:47 UTC

[GitHub] [helix] qqu0127 commented on a diff in pull request #2344: Added new metric to report real time missing top state for partition

qqu0127 commented on code in PR #2344:
URL: https://github.com/apache/helix/pull/2344#discussion_r1084318326


##########
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java:
##########
@@ -55,6 +55,54 @@
 import org.slf4j.LoggerFactory;
 
 public class ClusterStatusMonitor implements ClusterStatusMonitorMBean {
+  private class AsyncMissingTopStateMonitor extends Thread {
+    private final Map<String, Map<String, Long>> _missingTopStateResourceMap;

Review Comment:
   Does the map have to be a concurrent map? Any risk of concurrent modification?



##########
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java:
##########
@@ -55,6 +55,54 @@
 import org.slf4j.LoggerFactory;
 
 public class ClusterStatusMonitor implements ClusterStatusMonitorMBean {
+  private class AsyncMissingTopStateMonitor extends Thread {
+    private final Map<String, Map<String, Long>> _missingTopStateResourceMap;
+    private long _missingTopStateDurationThreshold = Long.MAX_VALUE;;
+
+    public AsyncMissingTopStateMonitor(Map<String, Map<String, Long>> missingTopStateResourceMap) {
+      _missingTopStateResourceMap = missingTopStateResourceMap;
+    }
+
+    public void setMissingTopStateDurationThreshold(long missingTopStateDurationThreshold) {
+      _missingTopStateDurationThreshold = missingTopStateDurationThreshold;
+    }
+
+    @Override
+    public void run() {
+      try {
+        synchronized (this) {
+          while (true) {
+            while (_missingTopStateResourceMap.size() == 0) {
+              this.wait();
+            }
+            for (Iterator<Map.Entry<String, Map<String, Long>>> resourcePartitionIt =
+                _missingTopStateResourceMap.entrySet().iterator(); resourcePartitionIt.hasNext(); ) {
+              Map.Entry<String, Map<String, Long>> resourcePartitionEntry = resourcePartitionIt.next();
+              // Iterate over all partitions and if any partition has missing top state greater than threshold then report
+              // it.
+              ResourceMonitor resourceMonitor = getOrCreateResourceMonitor(resourcePartitionEntry.getKey());
+              // If all partitions of resource has top state recovered then reset the counter
+              if (resourcePartitionEntry.getValue().isEmpty()) {
+                resourceMonitor.resetOneOrManyPartitionsMissingTopStateRealTimeGuage();
+                resourcePartitionIt.remove();
+              } else {

Review Comment:
   For me I'm not a fan of this iterate and modify pattern. It's not wrong, but any chance we can make it a pure function?



##########
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java:
##########
@@ -583,6 +643,33 @@ public void updateMissingTopStateDurationStats(String resourceName, long totalDu
     }
   }
 
+  public void updateMissingTopStateDurationThreshold(long missingTopStateDurationThreshold) {
+    _asyncMissingTopStateMonitor.setMissingTopStateDurationThreshold(missingTopStateDurationThreshold);
+  }
+
+  public void updateMissingTopStateResourceMap(String resourceName,String partitionName, boolean isTopStateMissing, long startTime) {
+    // Top state started missing
+    if (isTopStateMissing) {
+      // Wake up asyncMissingTopStateMonitor thread on first resource being added to map
+      if (_missingTopStateResourceMap.isEmpty()) {
+        synchronized (_asyncMissingTopStateMonitor) {
+          _asyncMissingTopStateMonitor.notify();
+        }
+      }
+      if (!_missingTopStateResourceMap.containsKey(resourceName)) {
+        _missingTopStateResourceMap.put(resourceName, new HashMap<String, Long>());
+      }

Review Comment:
   This can be reduced to computeIfAbsent ?



##########
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java:
##########
@@ -109,9 +157,21 @@ public class ClusterStatusMonitor implements ClusterStatusMonitorMBean {
 
   private final Map<String, JobMonitor> _perTypeJobMonitorMap = new ConcurrentHashMap<>();
 
+  /**
+   * Missing top state resource map: resourceName-><PartitionName->startTimeOfMissingTopState>
+   */
+  private final Map<String, Map<String, Long>> _missingTopStateResourceMap = new ConcurrentHashMap<>();
+  private final AsyncMissingTopStateMonitor _asyncMissingTopStateMonitor = new AsyncMissingTopStateMonitor(_missingTopStateResourceMap);

Review Comment:
   Now I see it's indeed a ConcurrentHashMap. 
   One nit, let's make this type explicit in the signature. 



##########
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java:
##########
@@ -583,6 +643,33 @@ public void updateMissingTopStateDurationStats(String resourceName, long totalDu
     }
   }
 
+  public void updateMissingTopStateDurationThreshold(long missingTopStateDurationThreshold) {
+    _asyncMissingTopStateMonitor.setMissingTopStateDurationThreshold(missingTopStateDurationThreshold);
+  }
+
+  public void updateMissingTopStateResourceMap(String resourceName,String partitionName, boolean isTopStateMissing, long startTime) {
+    // Top state started missing
+    if (isTopStateMissing) {
+      // Wake up asyncMissingTopStateMonitor thread on first resource being added to map
+      if (_missingTopStateResourceMap.isEmpty()) {
+        synchronized (_asyncMissingTopStateMonitor) {
+          _asyncMissingTopStateMonitor.notify();
+        }
+      }
+      if (!_missingTopStateResourceMap.containsKey(resourceName)) {
+        _missingTopStateResourceMap.put(resourceName, new HashMap<String, Long>());
+      }
+      _missingTopStateResourceMap.get(resourceName).put(partitionName, startTime);
+    } else { // top state recovered
+      // remove partitions from resourceMap whose top state has been recovered, this will put
+      // asyncMissingTopStateMonitor thread to sleep when no resources left to monitor.
+      Map<String, Long> entry = _missingTopStateResourceMap.get(resourceName);
+      if (entry != null) {
+        entry.remove(partitionName);
+      }

Review Comment:
   Related to the above comment on iterator. Can we do all the GC-like map cleanup here? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org