You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/08/04 03:28:11 UTC

[GitHub] [hbase] busbey commented on a change in pull request #2193: HBASE-24779 Report on the WAL edit buffer usage/limit for replication

busbey commented on a change in pull request #2193:
URL: https://github.com/apache/hbase/pull/2193#discussion_r464770288



##########
File path: hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsReplicationSourceSourceImpl.java
##########
@@ -314,4 +314,15 @@ public String getMetricsName() {
   @Override public long getEditsFiltered() {
     return this.walEditsFilteredCounter.value();
   }
+
+  @Override
+  public void setWALReaderEditsBufferBytes(long usage) {
+    //noop. Global limit, tracked globally. Do not need per-source metrics

Review comment:
       nit: wouldn't it still be useful to know if particular sources were eating up the global limit? or do we not have enough information tracked already to do that?

##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationEndpoint.java
##########
@@ -497,6 +497,33 @@ public boolean canReplicateToSameCluster() {
     }
   }
 
+  public static class SleepingReplicationEndpointForTest extends ReplicationEndpointForTest {

Review comment:
       AFAICT this only gets used in manual testing? add a comment on doing that. something like your example for this PR.

##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationEndpoint.java
##########
@@ -497,6 +497,33 @@ public boolean canReplicateToSameCluster() {
     }
   }
 
+  public static class SleepingReplicationEndpointForTest extends ReplicationEndpointForTest {
+    private long duration;
+    public SleepingReplicationEndpointForTest() {
+      super();
+    }
+
+    @Override
+    public void init(Context context) throws IOException {
+      super.init(context);
+      if (this.ctx != null) {
+        duration = this.ctx.getConfiguration().getLong(
+            "test.sleep.replication.endpoint.duration.millis", 5000L);

Review comment:
       nit: `hbase.test.sleep.replication.endpoint.duration.millis`

##########
File path: hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsReplicationSourceSource.java
##########
@@ -76,4 +77,13 @@
   long getWALEditsRead();
   long getShippedOps();
   long getEditsFiltered();
+  /**
+   * Sets the total usage of memory used by edits in memory read from WALs.
+   * @param usage The memory used by edits in bytes
+   */
+  void setWALReaderEditsBufferBytes(long usage);

Review comment:
       if we only want the global metric, why add this here instead of just on `MetricsReplicationGlobalSourceSource`?

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
##########
@@ -244,17 +248,22 @@ void addHFileRefsToQueue(TableName tableName, byte[] family, List<Pair<Path, Pat
 
     private final ReplicationSink replicationSink;
     private final ReplicationSourceManager replicationManager;
+    private final MetricsReplicationSourceSource globalMetricsSource;
 
     public ReplicationStatisticsTask(ReplicationSink replicationSink,
-        ReplicationSourceManager replicationManager) {
+        ReplicationSourceManager replicationManager, MetricsReplicationSourceSource globalMetricsSource) {
       this.replicationManager = replicationManager;
       this.replicationSink = replicationSink;
+      this.globalMetricsSource = globalMetricsSource;
     }
 
     @Override
     public void run() {
       printStats(this.replicationManager.getStats());
       printStats(this.replicationSink.getStats());
+
+      // Report how much data we've read off disk which is pending replication, across all sources
+      globalMetricsSource.setWALReaderEditsBufferBytes(replicationManager.getTotalBufferUsed().get());

Review comment:
       this feels odd. Why is it we only do updates for this one global metric here? why is it different then the other things tracked in the global metrics?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org