You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/08/03 19:25:55 UTC

[GitHub] [hbase] wchevreuil opened a new pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

wchevreuil opened a new pull request #2191:
URL: https://github.com/apache/hbase/pull/2191


   …nSourceManager upon termination


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] joshelser commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

joshelser commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r487113596



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();
+        } else {
+          // Wait both shipper and reader threads to stop
+          Thread.sleep(this.sleepForRetries);
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
+          this.source.getPeerId(), this.getName(), e);
+        Thread.currentThread().interrupt();
+      }
+    }
+    LongAccumulator totalToDecrement = new LongAccumulator((a,b) -> a + b, 0);
+    entryReader.entryBatchQueue.forEach(w -> {
+      entryReader.entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {
+        long entrySizeExcludeBulkLoad = entryReader.getEntrySizeExcludeBulkLoad(e);
+        totalToDecrement.accumulate(entrySizeExcludeBulkLoad);
+      });
+    });
+
+    source.getSourceManager().getTotalBufferUsed().addAndGet(-totalToDecrement.longValue());

Review comment:
       Thanks for updating the decrement logic.
   
   Maybe a TRACE log message to indicate the amount of buffer reclaimed as a part of shutting down. Sounds like that might be helpful in the future. e.g.
   
   `LOG.trace("Decrementing totalBufferUsed by {}B while stopping Replication WAL Readers", totalToDecrement.longValue())`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-692629711


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 31s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 52s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m  2s |  master passed  |
   | -0 :warning: |  patch  |   2m 11s |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 22s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m 27s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m 46s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 16s |  The patch does not generate ASF License warnings.  |
   |  |   |  35m 18s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/12/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 1e60f1a859fd 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / a3f40287ad |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/12/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-692612301


   > I left a few suggestions, but I think this is also looks fine as-is.
   
   Thanks for reviewing it @joshelser . Added the TRACE logging, replied on the other suggestions. Mind share another thought on those replies?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-688292670


   Thanks for the review, @busbey , had pushed a new commit addressing those. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-669310988


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  6s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  5s |  master passed  |
   | +1 :green_heart: |  compile  |   1m  4s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 39s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 49s |  hbase-server in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 33s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 10s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 10s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   7m  1s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 45s |  hbase-server in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 138m 26s |  hbase-server in the patch passed.  |
   |  |   | 167m 36s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 8ea11217c6ca 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d2f5a5f27b |
   | Default Java | 2020-01-14 |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/3/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/3/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/3/testReport/ |
   | Max. process+thread count | 4098 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r469391582



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -600,6 +600,10 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
         if (worker.entryReader.isAlive()) {
           worker.entryReader.interrupt();
         }
+      } else {
+        //If worker is already stopped but there was still entries batched,
+        //wee need to clear buffer used for non processed entries

Review comment:
       oops...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache9 commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache9 commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r470374275



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -650,6 +650,10 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
         if (worker.entryReader.isAlive()) {
           worker.entryReader.interrupt();
         }
+      } else {
+        //If worker is already stopped but there was still entries batched,
+        //we need to clear buffer used for non processed entries
+        worker.clearWALEntryBatch();

Review comment:
       We will not do this for the above condition branch?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r469382747



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java
##########
@@ -309,6 +310,16 @@ public WALEntryBatch poll(long timeout) throws InterruptedException {
     return entryBatchQueue.poll(timeout, TimeUnit.MILLISECONDS);
   }
 
+  public void clearWALEntryBatch() {

Review comment:
       We are only calling it from `ReplicationSource.terminate` (see line #606), after we certify that neither the shipper, nor the reader threads are alive anymore, so I don't think it would be an issue. Of course, there's the risk someone inadvertently call this method somewhere else, so maybe we should put a warning comment? I don't think there's any gain of synchronising accesses to totalBufferUsed variable here, concurrent threads could still succeed on the double decrement, if we call clearWALEntryBatch while shipper thread is still running.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-688311444


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  1s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  7s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m  7s |  master passed  |
   | -0 :warning: |  patch  |   2m 15s |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 45s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   1m 11s |  hbase-server: The patch generated 2 new + 2 unchanged - 0 fixed = 4 total (was 2)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  12m 17s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m 15s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 12s |  The patch does not generate ASF License warnings.  |
   |  |   |  35m 32s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/10/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 69ab3d9b3ba9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 1e8db480b3 |
   | checkstyle | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/10/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt |
   | Max. process+thread count | 84 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/10/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r484360392



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -651,6 +647,20 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
           worker.entryReader.interrupt();
         }
       }
+      //block this thread until worker thread is interrupted
+      while(worker.isAlive()){
+        try {
+          // Wait worker to stop
+          Thread.sleep(this.sleepForRetries);
+        } catch (InterruptedException e) {
+          LOG.info("{} Interrupted while waiting {} to stop", logPeerId(), worker.getName());
+          Thread.currentThread().interrupt();
+        }
+      }
+      //If worker is already stopped but there was still entries batched,
+      //we need to clear buffer used for non processed entries
+      worker.clearWALEntryBatch();

Review comment:
       Indeed, removing this block.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] joshelser commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

joshelser commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r469395639



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java
##########
@@ -309,6 +310,16 @@ public WALEntryBatch poll(long timeout) throws InterruptedException {
     return entryBatchQueue.poll(timeout, TimeUnit.MILLISECONDS);
   }
 
+  public void clearWALEntryBatch() {

Review comment:
       > after we certify that neither the shipper, nor the reader threads are alive anymore, so I don't think it would be an issue. Of course, there's the risk someone inadvertently call this method somewhere else, so maybe we should put a warning comment?
   
   Also OK to just put a warning if moving this check doesn't make things more clear :). Thanks for clarifying for me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] joshelser commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

joshelser commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r468152005



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -600,6 +600,10 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
         if (worker.entryReader.isAlive()) {
           worker.entryReader.interrupt();
         }
+      } else {
+        //If worker is already stopped but there was still entries batched,
+        //wee need to clear buffer used for non processed entries

Review comment:
       nit: `we`

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java
##########
@@ -309,6 +310,16 @@ public WALEntryBatch poll(long timeout) throws InterruptedException {
     return entryBatchQueue.poll(timeout, TimeUnit.MILLISECONDS);
   }
 
+  public void clearWALEntryBatch() {

Review comment:
       Should this be atomic? Is there anything to prevent two racing threads: one calling this `clearWALEntryBatch()` method and another trying to normally consume the next `WALEntryBatch`?
   
   I think this could lead to a double-decrement.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java
##########
@@ -309,6 +310,16 @@ public WALEntryBatch poll(long timeout) throws InterruptedException {
     return entryBatchQueue.poll(timeout, TimeUnit.MILLISECONDS);
   }
 
+  public void clearWALEntryBatch() {
+    entryBatchQueue.forEach(w -> {
+      entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {

Review comment:
       What about summing the total size to decrement and then making the one call to `totalBufferUsed`? Guessing that might be micro-optimized faster.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-668271223


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 29s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 46s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 58s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 33s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 25s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 56s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 56s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 32s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 145m 33s |  hbase-server in the patch passed.  |
   |  |   | 169m 24s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux b067ede1f97f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 21a0b8ea11 |
   | Default Java | 1.8.0_232 |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/1/testReport/ |
   | Max. process+thread count | 4000 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/1/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] joshelser commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

joshelser commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r487111250



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Semantics of this have changed, but I'm not seeing conversation that indicates that it was intentional.
   
   Before: we would stop all workers, then wait for them all to be stopped. Each worker could stop itself concurrently. Now, for each worker, we request a stop and then wait for it to be stopped, then move on to the next worker.
   
   I don't _think_ this is a big deal, but wanted to call it out.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       Could just `break` instead of interrupting this thread.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();
+        } else {
+          // Wait both shipper and reader threads to stop
+          Thread.sleep(this.sleepForRetries);
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
+          this.source.getPeerId(), this.getName(), e);
+        Thread.currentThread().interrupt();
+      }
+    }
+    LongAccumulator totalToDecrement = new LongAccumulator((a,b) -> a + b, 0);
+    entryReader.entryBatchQueue.forEach(w -> {
+      entryReader.entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {
+        long entrySizeExcludeBulkLoad = entryReader.getEntrySizeExcludeBulkLoad(e);
+        totalToDecrement.accumulate(entrySizeExcludeBulkLoad);
+      });
+    });
+
+    source.getSourceManager().getTotalBufferUsed().addAndGet(-totalToDecrement.longValue());

Review comment:
       Thanks for updating the decrement logic.
   
   Maybe a TRACE log message to indicate the amount of buffer reclaimed as a part of shutting down. Sounds like that might be helpful in the future. e.g.
   
   `LOG.trace("Decrementing totalBufferUsed by {}B while stopping Replication WAL Readers", totalToDecrement.longValue())`

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Semantics of this have changed, but I'm not seeing conversation that indicates that it was intentional.
   
   Before: we would stop all workers, then wait for them all to be stopped. Each worker could stop itself concurrently. Now, for each worker, we request a stop and then wait for it to be stopped, then move on to the next worker.
   
   I don't _think_ this is a big deal, but wanted to call it out.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       Could just `break` instead of interrupting this thread.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();
+        } else {
+          // Wait both shipper and reader threads to stop
+          Thread.sleep(this.sleepForRetries);
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
+          this.source.getPeerId(), this.getName(), e);
+        Thread.currentThread().interrupt();
+      }
+    }
+    LongAccumulator totalToDecrement = new LongAccumulator((a,b) -> a + b, 0);
+    entryReader.entryBatchQueue.forEach(w -> {
+      entryReader.entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {
+        long entrySizeExcludeBulkLoad = entryReader.getEntrySizeExcludeBulkLoad(e);
+        totalToDecrement.accumulate(entrySizeExcludeBulkLoad);
+      });
+    });
+
+    source.getSourceManager().getTotalBufferUsed().addAndGet(-totalToDecrement.longValue());

Review comment:
       Thanks for updating the decrement logic.
   
   Maybe a TRACE log message to indicate the amount of buffer reclaimed as a part of shutting down. Sounds like that might be helpful in the future. e.g.
   
   `LOG.trace("Decrementing totalBufferUsed by {}B while stopping Replication WAL Readers", totalToDecrement.longValue())`

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Semantics of this have changed, but I'm not seeing conversation that indicates that it was intentional.
   
   Before: we would stop all workers, then wait for them all to be stopped. Each worker could stop itself concurrently. Now, for each worker, we request a stop and then wait for it to be stopped, then move on to the next worker.
   
   I don't _think_ this is a big deal, but wanted to call it out.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       Could just `break` instead of interrupting this thread.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();
+        } else {
+          // Wait both shipper and reader threads to stop
+          Thread.sleep(this.sleepForRetries);
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
+          this.source.getPeerId(), this.getName(), e);
+        Thread.currentThread().interrupt();
+      }
+    }
+    LongAccumulator totalToDecrement = new LongAccumulator((a,b) -> a + b, 0);
+    entryReader.entryBatchQueue.forEach(w -> {
+      entryReader.entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {
+        long entrySizeExcludeBulkLoad = entryReader.getEntrySizeExcludeBulkLoad(e);
+        totalToDecrement.accumulate(entrySizeExcludeBulkLoad);
+      });
+    });
+
+    source.getSourceManager().getTotalBufferUsed().addAndGet(-totalToDecrement.longValue());

Review comment:
       Thanks for updating the decrement logic.
   
   Maybe a TRACE log message to indicate the amount of buffer reclaimed as a part of shutting down. Sounds like that might be helpful in the future. e.g.
   
   `LOG.trace("Decrementing totalBufferUsed by {}B while stopping Replication WAL Readers", totalToDecrement.longValue())`

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Semantics of this have changed, but I'm not seeing conversation that indicates that it was intentional.
   
   Before: we would stop all workers, then wait for them all to be stopped. Each worker could stop itself concurrently. Now, for each worker, we request a stop and then wait for it to be stopped, then move on to the next worker.
   
   I don't _think_ this is a big deal, but wanted to call it out.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       Could just `break` instead of interrupting this thread.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();
+        } else {
+          // Wait both shipper and reader threads to stop
+          Thread.sleep(this.sleepForRetries);
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
+          this.source.getPeerId(), this.getName(), e);
+        Thread.currentThread().interrupt();
+      }
+    }
+    LongAccumulator totalToDecrement = new LongAccumulator((a,b) -> a + b, 0);
+    entryReader.entryBatchQueue.forEach(w -> {
+      entryReader.entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {
+        long entrySizeExcludeBulkLoad = entryReader.getEntrySizeExcludeBulkLoad(e);
+        totalToDecrement.accumulate(entrySizeExcludeBulkLoad);
+      });
+    });
+
+    source.getSourceManager().getTotalBufferUsed().addAndGet(-totalToDecrement.longValue());

Review comment:
       Thanks for updating the decrement logic.
   
   Maybe a TRACE log message to indicate the amount of buffer reclaimed as a part of shutting down. Sounds like that might be helpful in the future. e.g.
   
   `LOG.trace("Decrementing totalBufferUsed by {}B while stopping Replication WAL Readers", totalToDecrement.longValue())`

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Semantics of this have changed, but I'm not seeing conversation that indicates that it was intentional.
   
   Before: we would stop all workers, then wait for them all to be stopped. Each worker could stop itself concurrently. Now, for each worker, we request a stop and then wait for it to be stopped, then move on to the next worker.
   
   I don't _think_ this is a big deal, but wanted to call it out.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       Could just `break` instead of interrupting this thread.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();
+        } else {
+          // Wait both shipper and reader threads to stop
+          Thread.sleep(this.sleepForRetries);
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
+          this.source.getPeerId(), this.getName(), e);
+        Thread.currentThread().interrupt();
+      }
+    }
+    LongAccumulator totalToDecrement = new LongAccumulator((a,b) -> a + b, 0);
+    entryReader.entryBatchQueue.forEach(w -> {
+      entryReader.entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {
+        long entrySizeExcludeBulkLoad = entryReader.getEntrySizeExcludeBulkLoad(e);
+        totalToDecrement.accumulate(entrySizeExcludeBulkLoad);
+      });
+    });
+
+    source.getSourceManager().getTotalBufferUsed().addAndGet(-totalToDecrement.longValue());

Review comment:
       Thanks for updating the decrement logic.
   
   Maybe a TRACE log message to indicate the amount of buffer reclaimed as a part of shutting down. Sounds like that might be helpful in the future. e.g.
   
   `LOG.trace("Decrementing totalBufferUsed by {}B while stopping Replication WAL Readers", totalToDecrement.longValue())`

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Semantics of this have changed, but I'm not seeing conversation that indicates that it was intentional.
   
   Before: we would stop all workers, then wait for them all to be stopped. Each worker could stop itself concurrently. Now, for each worker, we request a stop and then wait for it to be stopped, then move on to the next worker.
   
   I don't _think_ this is a big deal, but wanted to call it out.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       Could just `break` instead of interrupting this thread.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();
+        } else {
+          // Wait both shipper and reader threads to stop
+          Thread.sleep(this.sleepForRetries);
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
+          this.source.getPeerId(), this.getName(), e);
+        Thread.currentThread().interrupt();
+      }
+    }
+    LongAccumulator totalToDecrement = new LongAccumulator((a,b) -> a + b, 0);
+    entryReader.entryBatchQueue.forEach(w -> {
+      entryReader.entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {
+        long entrySizeExcludeBulkLoad = entryReader.getEntrySizeExcludeBulkLoad(e);
+        totalToDecrement.accumulate(entrySizeExcludeBulkLoad);
+      });
+    });
+
+    source.getSourceManager().getTotalBufferUsed().addAndGet(-totalToDecrement.longValue());

Review comment:
       Thanks for updating the decrement logic.
   
   Maybe a TRACE log message to indicate the amount of buffer reclaimed as a part of shutting down. Sounds like that might be helpful in the future. e.g.
   
   `LOG.trace("Decrementing totalBufferUsed by {}B while stopping Replication WAL Readers", totalToDecrement.longValue())`

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Semantics of this have changed, but I'm not seeing conversation that indicates that it was intentional.
   
   Before: we would stop all workers, then wait for them all to be stopped. Each worker could stop itself concurrently. Now, for each worker, we request a stop and then wait for it to be stopped, then move on to the next worker.
   
   I don't _think_ this is a big deal, but wanted to call it out.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       Could just `break` instead of interrupting this thread.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();
+        } else {
+          // Wait both shipper and reader threads to stop
+          Thread.sleep(this.sleepForRetries);
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
+          this.source.getPeerId(), this.getName(), e);
+        Thread.currentThread().interrupt();
+      }
+    }
+    LongAccumulator totalToDecrement = new LongAccumulator((a,b) -> a + b, 0);
+    entryReader.entryBatchQueue.forEach(w -> {
+      entryReader.entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {
+        long entrySizeExcludeBulkLoad = entryReader.getEntrySizeExcludeBulkLoad(e);
+        totalToDecrement.accumulate(entrySizeExcludeBulkLoad);
+      });
+    });
+
+    source.getSourceManager().getTotalBufferUsed().addAndGet(-totalToDecrement.longValue());

Review comment:
       Thanks for updating the decrement logic.
   
   Maybe a TRACE log message to indicate the amount of buffer reclaimed as a part of shutting down. Sounds like that might be helpful in the future. e.g.
   
   `LOG.trace("Decrementing totalBufferUsed by {}B while stopping Replication WAL Readers", totalToDecrement.longValue())`

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Semantics of this have changed, but I'm not seeing conversation that indicates that it was intentional.
   
   Before: we would stop all workers, then wait for them all to be stopped. Each worker could stop itself concurrently. Now, for each worker, we request a stop and then wait for it to be stopped, then move on to the next worker.
   
   I don't _think_ this is a big deal, but wanted to call it out.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       Could just `break` instead of interrupting this thread.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();
+        } else {
+          // Wait both shipper and reader threads to stop
+          Thread.sleep(this.sleepForRetries);
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
+          this.source.getPeerId(), this.getName(), e);
+        Thread.currentThread().interrupt();
+      }
+    }
+    LongAccumulator totalToDecrement = new LongAccumulator((a,b) -> a + b, 0);
+    entryReader.entryBatchQueue.forEach(w -> {
+      entryReader.entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {
+        long entrySizeExcludeBulkLoad = entryReader.getEntrySizeExcludeBulkLoad(e);
+        totalToDecrement.accumulate(entrySizeExcludeBulkLoad);
+      });
+    });
+
+    source.getSourceManager().getTotalBufferUsed().addAndGet(-totalToDecrement.longValue());

Review comment:
       Thanks for updating the decrement logic.
   
   Maybe a TRACE log message to indicate the amount of buffer reclaimed as a part of shutting down. Sounds like that might be helpful in the future. e.g.
   
   `LOG.trace("Decrementing totalBufferUsed by {}B while stopping Replication WAL Readers", totalToDecrement.longValue())`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-697860614


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   2m 30s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 30s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m 16s |  master passed  |
   | -0 :warning: |  patch  |   2m 25s |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 58s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   1m 14s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  13m 40s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m 31s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 18s |  The patch does not generate ASF License warnings.  |
   |  |   |  40m 38s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/14/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 482ebbebfd74 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / dca0b593cf |
   | Max. process+thread count | 95 (vs. ulimit of 30000) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/14/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-668496996


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  8s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 43s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 59s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 25s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   1m  4s |  hbase-server: The patch generated 3 new + 4 unchanged - 6 fixed = 7 total (was 10)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m  2s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m  6s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 15s |  The patch does not generate ASF License warnings.  |
   |  |   |  33m  7s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/2/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 52acb81cf204 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d2f5a5f27b |
   | checkstyle | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/2/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/2/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-669309528


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 28s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 41s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 56s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 33s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 23s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 54s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 54s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 37s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 140m 32s |  hbase-server in the patch passed.  |
   |  |   | 164m 16s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux ddd32ec70b95 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d2f5a5f27b |
   | Default Java | 1.8.0_232 |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/3/testReport/ |
   | Max. process+thread count | 4888 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] joshelser commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

joshelser commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r493857508



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Nope, all good.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-674171455


   > Can we do this at the end of the run method of ReplicationSourceShipper?
   
   When removing the source, ReplicationSource.terminate will interrupt the shipper thread and the end of ReplicationSourceShipper.run might not be executed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r469391760



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java
##########
@@ -309,6 +310,16 @@ public WALEntryBatch poll(long timeout) throws InterruptedException {
     return entryBatchQueue.poll(timeout, TimeUnit.MILLISECONDS);
   }
 
+  public void clearWALEntryBatch() {
+    entryBatchQueue.forEach(w -> {
+      entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {

Review comment:
       Makes sense, coming soon...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r484362883



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,39 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTE</b> This method should be only called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.

Review comment:
       Ack.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil merged pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil merged pull request #2191:
URL: https://github.com/apache/hbase/pull/2191


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r484354917



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,10 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
-    for (ReplicationSourceShipper worker : workers) {
-      worker.stopWorker();
-      if(worker.entryReader != null) {
-        worker.entryReader.setReaderRunning(false);
-      }
-    }
 
     for (ReplicationSourceShipper worker : workers) {
+      worker.stopWorker();
+      worker.entryReader.setReaderRunning(false);

Review comment:
       I can't think of a valid scenario when source would be running without setting an entry reader in the shipper, but let me bring back the null check, for safety.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r488543053



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();
+        } else {
+          // Wait both shipper and reader threads to stop
+          Thread.sleep(this.sleepForRetries);
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
+          this.source.getPeerId(), this.getName(), e);
+        Thread.currentThread().interrupt();
+      }
+    }
+    LongAccumulator totalToDecrement = new LongAccumulator((a,b) -> a + b, 0);
+    entryReader.entryBatchQueue.forEach(w -> {
+      entryReader.entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {
+        long entrySizeExcludeBulkLoad = entryReader.getEntrySizeExcludeBulkLoad(e);
+        totalToDecrement.accumulate(entrySizeExcludeBulkLoad);
+      });
+    });
+
+    source.getSourceManager().getTotalBufferUsed().addAndGet(-totalToDecrement.longValue());

Review comment:
       Good idea, added in the last commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-669234775


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 27s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  1s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 45s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 59s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 20s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  hbase-server: The patch generated 0 new + 4 unchanged - 6 fixed = 4 total (was 10)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m  7s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m  6s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 15s |  The patch does not generate ASF License warnings.  |
   |  |   |  32m 11s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/3/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 852de549ebe9 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d2f5a5f27b |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-682057865


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 40s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 46s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 26s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m 31s |  master passed  |
   | -0 :warning: |  patch  |   2m 40s |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 32s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   1m 23s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  14m 26s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m 47s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 15s |  The patch does not generate ASF License warnings.  |
   |  |   |  42m 49s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/9/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux ce7e9800036f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 047e0618d2 |
   | Max. process+thread count | 84 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/9/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r484354917



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,10 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
-    for (ReplicationSourceShipper worker : workers) {
-      worker.stopWorker();
-      if(worker.entryReader != null) {
-        worker.entryReader.setReaderRunning(false);
-      }
-    }
 
     for (ReplicationSourceShipper worker : workers) {
+      worker.stopWorker();
+      worker.entryReader.setReaderRunning(false);

Review comment:
       I can't think of a valid scenario when source would be running without setting an entry reader in the shipper, but let me bring back the null check, for safety. I originally removed it because I didn't think the check makes sense, as few lines below we are referring "worker.entryReader.isAlive()" directly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-668270078


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 10s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 17s |  master passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 50s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 43s |  hbase-server in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  1s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  5s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 41s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 40s |  hbase-server in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 139m  5s |  hbase-server in the patch passed.  |
   |  |   | 165m 52s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux d49654b5f398 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 21a0b8ea11 |
   | Default Java | 2020-01-14 |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/1/testReport/ |
   | Max. process+thread count | 4528 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/1/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-668551955


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 29s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 24s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 54s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 35s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 29s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 30s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 142m 41s |  hbase-server in the patch passed.  |
   |  |   | 166m 20s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 6dcb394db800 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d2f5a5f27b |
   | Default Java | 1.8.0_232 |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/2/testReport/ |
   | Max. process+thread count | 3896 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/2/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-673491296


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  6s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  3s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 59s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m  8s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  master passed  |
   | -0 :warning: |  patch  |   6m 55s |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 43s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 57s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 57s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m  9s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 213m 47s |  hbase-server in the patch failed.  |
   |  |   | 239m 57s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 21c4556c297b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d2eb69df77 |
   | Default Java | 1.8.0_232 |
   | unit | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/testReport/ |
   | Max. process+thread count | 2967 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r470681579



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -650,6 +650,10 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
         if (worker.entryReader.isAlive()) {
           worker.entryReader.interrupt();
         }
+      } else {
+        //If worker is already stopped but there was still entries batched,
+        //we need to clear buffer used for non processed entries
+        worker.clearWALEntryBatch();

Review comment:
       We don't want to clear entry batch if shipper thread is still running, because it may race for the bufferUsage update. However, I realised we need to go through the branch first to interrupt it, got confused by an additional loop calling `stopWorker` prior to this one. I'm fixing this on a following commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-673458194


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  8s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  0s |  master passed  |
   | +1 :green_heart: |  compile  |   1m  8s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 20s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 42s |  hbase-server in master failed.  |
   | -0 :warning: |  patch  |   7m 13s |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 18s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 18s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 25s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 46s |  hbase-server in the patch failed.  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 149m 55s |  hbase-server in the patch failed.  |
   |  |   | 178m 14s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux b8c3e7b2881c 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d2eb69df77 |
   | Default Java | 2020-01-14 |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt |
   | unit | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/testReport/ |
   | Max. process+thread count | 4384 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-697235648


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 30s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 38s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m  4s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 58s |  master passed  |
   | -0 :warning: |  patch  |   2m  7s |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 25s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m 20s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m 15s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 12s |  The patch does not generate ASF License warnings.  |
   |  |   |  32m 46s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/13/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux f132ca3d766c 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / dca0b593cf |
   | Max. process+thread count | 94 (vs. ulimit of 30000) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/13/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-674169286


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 31s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 44s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m  4s |  master passed  |
   | -0 :warning: |  patch  |   2m 13s |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 31s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   1m  6s |  hbase-server: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m 27s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m 14s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 12s |  The patch does not generate ASF License warnings.  |
   |  |   |  33m 24s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/7/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 8d4bb164936c 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / c81ef7368e |
   | checkstyle | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/7/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/7/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-682051807


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 35s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 38s |  master passed  |
   | +1 :green_heart: |  compile  |   1m  2s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 42s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  master passed  |
   | -0 :warning: |  patch  |   7m 33s |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 28s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 58s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 58s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 41s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  |   6m 42s |  hbase-server in the patch failed.  |
   |  |   |  32m  5s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/9/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 052bf4ab69ed 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 047e0618d2 |
   | Default Java | 1.8.0_232 |
   | unit | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/9/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/9/testReport/ |
   | Max. process+thread count | 804 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/9/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-680765886


   retest build


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r488539725



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Sorry for not explaining this before. Yeah, just thought there was not real gains on doing two iterations, as at the end, we would need to make sure all workers were indeed stopped. I can revert it back, if you perfer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-668549855


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 31s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 18s |  master passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 50s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 46s |  hbase-server in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 17s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  9s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  9s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m  4s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 42s |  hbase-server in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 133m 51s |  hbase-server in the patch passed.  |
   |  |   | 160m 38s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux a1647e3a638b 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d2f5a5f27b |
   | Default Java | 2020-01-14 |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/2/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/2/testReport/ |
   | Max. process+thread count | 4138 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/2/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-673400498


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 35s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   5m  0s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 29s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m 50s |  master passed  |
   | -0 :warning: |  patch  |   3m  0s |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 30s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   1m 26s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  14m 44s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m 40s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 14s |  The patch does not generate ASF License warnings.  |
   |  |   |  43m 17s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 7e2fddf0496d 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d2eb69df77 |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/6/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] joshelser commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

joshelser commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r487112532



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       Could just `break` instead of interrupting this thread.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r488540526



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       So, thought about following the exceptional scenarios approach already implemented in ReplicationSource, where the source thread is interrupted. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-680020351


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  1s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 30s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m  1s |  master passed  |
   | -0 :warning: |  patch  |   2m  9s |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 22s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m 15s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m  7s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 16s |  The patch does not generate ASF License warnings.  |
   |  |   |  32m 32s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/8/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux cd662e374d51 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 6b0707f541 |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/8/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] joshelser commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

joshelser commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r493859002



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       Ah, so propagating the interrupt state is important, outside of this method. If you took my suggestion, we'd just be duplicating the `catch` block, so this is probably fine as-is.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] Apache-HBase commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

Apache-HBase commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-668217366


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 31s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 44s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m  7s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m  0s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 21s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   1m  3s |  hbase-server: The patch generated 5 new + 4 unchanged - 6 fixed = 9 total (was 10)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m  2s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m  8s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 15s |  The patch does not generate ASF License warnings.  |
   |  |   |  32m 16s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/1/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2191 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 7417cc74f102 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 21a0b8ea11 |
   | checkstyle | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2191/1/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] joshelser commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

joshelser commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r487111250



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Semantics of this have changed, but I'm not seeing conversation that indicates that it was intentional.
   
   Before: we would stop all workers, then wait for them all to be stopped. Each worker could stop itself concurrently. Now, for each worker, we request a stop and then wait for it to be stopped, then move on to the next worker.
   
   I don't _think_ this is a big deal, but wanted to call it out.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();

Review comment:
       Could just `break` instead of interrupting this thread.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,51 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTES</b>
+   * 1) This method should only be called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   *
+   * 2) It <b>does not</b> attempt to terminate reader and shipper threads. Those <b>must</b>
+   * have been triggered interruption/termination prior to calling this method.
+   */
+  void clearWALEntryBatch() {
+    long timeout = System.currentTimeMillis() + this.shipEditsTimeout;
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        if (System.currentTimeMillis() >= timeout) {
+          LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+            + "thread to stop.", this.source.getPeerId());
+          Thread.currentThread().interrupt();
+        } else {
+          // Wait both shipper and reader threads to stop
+          Thread.sleep(this.sleepForRetries);
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
+          this.source.getPeerId(), this.getName(), e);
+        Thread.currentThread().interrupt();
+      }
+    }
+    LongAccumulator totalToDecrement = new LongAccumulator((a,b) -> a + b, 0);
+    entryReader.entryBatchQueue.forEach(w -> {
+      entryReader.entryBatchQueue.remove(w);
+      w.getWalEntries().forEach(e -> {
+        long entrySizeExcludeBulkLoad = entryReader.getEntrySizeExcludeBulkLoad(e);
+        totalToDecrement.accumulate(entrySizeExcludeBulkLoad);
+      });
+    });
+
+    source.getSourceManager().getTotalBufferUsed().addAndGet(-totalToDecrement.longValue());

Review comment:
       Thanks for updating the decrement logic.
   
   Maybe a TRACE log message to indicate the amount of buffer reclaimed as a part of shutting down. Sounds like that might be helpful in the future. e.g.
   
   `LOG.trace("Decrementing totalBufferUsed by {}B while stopping Replication WAL Readers", totalToDecrement.longValue())`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] busbey commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

busbey commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r483374802



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,10 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
-    for (ReplicationSourceShipper worker : workers) {
-      worker.stopWorker();
-      if(worker.entryReader != null) {
-        worker.entryReader.setReaderRunning(false);
-      }
-    }
 
     for (ReplicationSourceShipper worker : workers) {
+      worker.stopWorker();
+      worker.entryReader.setReaderRunning(false);

Review comment:
       we don't need the null check still?

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,39 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTE</b> This method should be only called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   */
+  void clearWALEntryBatch() {
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        // Wait both shipper and reader threads to stop
+        Thread.sleep(this.sleepForRetries);
+      } catch (InterruptedException e) {
+        LOG.info("{} Interrupted while waiting {} to stop on clearWALEntryBatch",
+          this.source.getPeerId(), this.getName());
+        Thread.currentThread().interrupt();
+      }
+    }

Review comment:
       should we have a timeout here? or is there a timeout above us that will interrupt if we take too long?

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,39 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTE</b> This method should be only called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.

Review comment:
       expressly note that both the worker and the entry reader should have already been interrupted because we're not doing it here.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -651,6 +647,20 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
           worker.entryReader.interrupt();
         }
       }
+      //block this thread until worker thread is interrupted
+      while(worker.isAlive()){
+        try {
+          // Wait worker to stop
+          Thread.sleep(this.sleepForRetries);
+        } catch (InterruptedException e) {
+          LOG.info("{} Interrupted while waiting {} to stop", logPeerId(), worker.getName());
+          Thread.currentThread().interrupt();
+        }
+      }
+      //If worker is already stopped but there was still entries batched,
+      //we need to clear buffer used for non processed entries
+      worker.clearWALEntryBatch();

Review comment:
       given that `clearWALEntryBatch` will wait for the worker to not be alive, why are we waiting for it here as well?

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,39 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTE</b> This method should be only called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   */
+  void clearWALEntryBatch() {
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        // Wait both shipper and reader threads to stop
+        Thread.sleep(this.sleepForRetries);
+      } catch (InterruptedException e) {
+        LOG.info("{} Interrupted while waiting {} to stop on clearWALEntryBatch",

Review comment:
       is info the right level here? maybe it is? but if we get interrupted that means we could go to do the update below in a racy way with the other threads right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#issuecomment-673375360


   retest build


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r484370875



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,39 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTE</b> This method should be only called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   */
+  void clearWALEntryBatch() {
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        // Wait both shipper and reader threads to stop
+        Thread.sleep(this.sleepForRetries);
+      } catch (InterruptedException e) {
+        LOG.info("{} Interrupted while waiting {} to stop on clearWALEntryBatch",

Review comment:
       > is info the right level here? maybe it is? 
   
   We should probably raise it to a warn here, because that meant the source termination didn't complete properly and we might not have "cleared" this source pending entries size from source manager buffer.
   
   > but if we get interrupted that means we could go to do the update below in a racy way with the other threads right?
   
   I don't think it would be the case. We may have multiple shippers for a source, but we access this method in the isolated context of the source thread. On the case we are terminating multiple sources (due to recovered queues or multiple peers), we are still calculating entries size to be decreased in the context of each source thread. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r469394487



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java
##########
@@ -309,6 +310,16 @@ public WALEntryBatch poll(long timeout) throws InterruptedException {
     return entryBatchQueue.poll(timeout, TimeUnit.MILLISECONDS);
   }
 
+  public void clearWALEntryBatch() {

Review comment:
       > We are only calling it from ReplicationSource.terminate (see line #606), after we certify that neither the shipper, nor the reader threads are alive anymore
   
   Actually, we could move that check to clearWALEntryBatch method itself, and since shipper has a reference to reader, but not the other way around, we can move it to shipper, instead? Let me give it a try.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r484383560



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java
##########
@@ -324,4 +326,39 @@ void stopWorker() {
   public boolean isFinished() {
     return state == WorkerState.FINISHED;
   }
+
+  /**
+   * Attempts to properly update <code>ReplicationSourceManager.totalBufferUser</code>,
+   * in case there were unprocessed entries batched by the reader to the shipper,
+   * but the shipper didn't manage to ship those because the replication source is being terminated.
+   * In that case, it iterates through the batched entries and decrease the pending
+   * entries size from <code>ReplicationSourceManager.totalBufferUser</code>
+   * <p/>
+   * <b>NOTE</b> This method should be only called upon replication source termination.
+   * It blocks waiting for both shipper and reader threads termination,
+   * to make sure no race conditions
+   * when updating <code>ReplicationSourceManager.totalBufferUser</code>.
+   */
+  void clearWALEntryBatch() {
+    while(this.isAlive() || this.entryReader.isAlive()){
+      try {
+        // Wait both shipper and reader threads to stop
+        Thread.sleep(this.sleepForRetries);
+      } catch (InterruptedException e) {
+        LOG.info("{} Interrupted while waiting {} to stop on clearWALEntryBatch",
+          this.source.getPeerId(), this.getName());
+        Thread.currentThread().interrupt();
+      }
+    }

Review comment:
       That's a good point, we don't timeout these locks anywhere, AFICS. Let me put one here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] wchevreuil merged pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

wchevreuil merged pull request #2191:
URL: https://github.com/apache/hbase/pull/2191


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hbase] joshelser commented on a change in pull request #2191: HBASE-24813 ReplicationSource should clear buffer usage on Replicatio…

Posted by GitBox <gi...@apache.org>.

joshelser commented on a change in pull request #2191:
URL: https://github.com/apache/hbase/pull/2191#discussion_r487111250



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
##########
@@ -626,14 +626,12 @@ public void terminate(String reason, Exception cause, boolean clearMetrics, bool
       Threads.shutdown(initThread, this.sleepForRetries);
     }
     Collection<ReplicationSourceShipper> workers = workerThreads.values();
+
     for (ReplicationSourceShipper worker : workers) {
       worker.stopWorker();
-      if(worker.entryReader != null) {
+      if (worker.entryReader != null) {
         worker.entryReader.setReaderRunning(false);
       }
-    }
-
-    for (ReplicationSourceShipper worker : workers) {

Review comment:
       Semantics of this have changed, but I'm not seeing conversation that indicates that it was intentional.
   
   Before: we would stop all workers, then wait for them all to be stopped. Each worker could stop itself concurrently. Now, for each worker, we request a stop and then wait for it to be stopped, then move on to the next worker.
   
   I don't _think_ this is a big deal, but wanted to call it out.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org