You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2022/04/12 02:13:51 UTC

[GitHub] [hbase] apurtell opened a new pull request, #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

apurtell opened a new pull request, #4338:
URL: https://github.com/apache/hbase/pull/4338

   Compactions might be concurrent against a given store and the Compactor is
   shared among them. Do not put mutable state into shared class fields. All
   Compactor class fields should be final. At the moment 'keepSeqIdPeriod'
   is an exception to this rule because some unit tests change it.
   
   Compactor#getProgress and Compactor#getCompactionTargets now return union
   results of all compactions in progress against the store.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1096123137

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   3m 42s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  2s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 24s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   3m 55s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m  9s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 34s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   3m 53s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 20s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 187m 22s |  hbase-server in the patch failed.  |
   |  |   | 206m 24s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4338 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 251a03d19c4c 5.4.0-1071-aws #76~18.04.1-Ubuntu SMP Mon Mar 28 17:49:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 1d8a5bf0e3 |
   | Default Java | AdoptOpenJDK-1.8.0_282-b08 |
   | unit | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt |
   |  Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/1/testReport/ |
   | Max. process+thread count | 2738 (vs. ulimit of 30000) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/1/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] joshelser commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
joshelser commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1097047232

   FYI, we have some folks running this on a cluster against S3 now. I'll find their Github IDs to tag them here, so we can keep you up to date on real test runs :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848600124


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -381,34 +377,46 @@ protected final List<Path> compact(final CompactionRequestImpl request,
       } else {
         Closeables.close(scanner, true);
       }
-      if (!finished && writer != null) {
-        abortWriter();
+      if (progress != null) {
+        synchronized (this.progress) {
+          this.progress.remove(progress);
+        }
+      }
+      if (writer != null) {

Review Comment:
   Yes, but see the asserts below:
   
       assert finished : "We should have exited the method on all error paths";
       assert writer != null : "Writer should be non-null if no error";
   
   These were in place even without a `return` here in the previous code. I can add it. It does seem it was (and is) incorrect.
   
   When we drop through to the end of this method there is a finished `writer` we can commit, or so this claims. I will convert those assertions to throws of `IllegalStateException` so we can see if these invariants are not holding even when assertions are not enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098587868

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 56s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 25s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 15s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 59s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m  8s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m  8s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m 33s |  hbase-server: The patch generated 1 new + 39 unchanged - 0 fixed = 40 total (was 39)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m 23s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.  |
   | +1 :green_heart: |  spotbugs  |   1m 16s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m  8s |  The patch does not generate ASF License warnings.  |
   |  |   |  29m 46s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/3/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4338 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile |
   | uname | Linux 609aeac8c873 5.4.0-1071-aws #76~18.04.1-Ubuntu SMP Mon Mar 28 17:49:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f990f56e8e |
   | Default Java | AdoptOpenJDK-1.8.0_282-b08 |
   | checkstyle | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/3/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt |
   | Max. process+thread count | 64 (vs. ulimit of 30000) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/3/console |
   | versions | git=2.17.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098561420

   Pushed an update that brings all of the above discussion together. 
   - Compactions might be concurrent against a given store and the Compactor is shared among them. Add a comment to the class javadoc to not put mutable state into shared class fields. Make all Compactor class fields final, with the exception of `keepSeqIdPeriod`, which is set by unit tests. 
   - Pass `writer` and `progress` instances through compaction as method parmeters of `performCompaction` to improve MT-safety of these code paths.
   - Scope compaction progress to each compaction to improve MT-safety overall and the accuracy of compaction progress reporting.
   
   All compaction unit tests pass. 
   
       [INFO] -------------------------------------------------------
       [INFO]  T E S T S
       [INFO] -------------------------------------------------------
       [INFO] Running org.apache.hadoop.hbase.util.compaction.TestMajorCompactionTTLRequest
       [INFO] Running org.apache.hadoop.hbase.util.compaction.TestMajorCompactionRequest
       [INFO] Running org.apache.hadoop.hbase.quotas.policies.TestNoWritesCompactionsViolationPolicyEnforcement
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestMinorCompaction
       [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.427 s - in org.apache.hadoop.hbase.quotas.policies.TestNoWritesCompactionsViolationPolicyEnforcement
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestDateTieredCompactionPolicyOverflow
       [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.942 s - in org.apache.hadoop.hbase.regionserver.TestDateTieredCompactionPolicyOverflow
       [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.878 s - in org.apache.hadoop.hbase.util.compaction.TestMajorCompactionTTLRequest
       [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.918 s - in org.apache.hadoop.hbase.util.compaction.TestMajorCompactionRequest
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestCompactionState
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestCompactionArchiveConcurrentClose
       [INFO] Running org.apache.hadoop.hbase.regionserver.throttle.TestCompactionWithThroughputController
       [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.935 s - in org.apache.hadoop.hbase.regionserver.TestMinorCompaction
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestCompactionWithCoprocessor
       [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.828 s - in org.apache.hadoop.hbase.regionserver.TestCompactionArchiveConcurrentClose
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestCompactionWithByteBuff
       [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.662 s - in org.apache.hadoop.hbase.regionserver.TestCompactionWithByteBuff
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestCompactionFileNotFound
       [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.873 s - in org.apache.hadoop.hbase.regionserver.TestCompactionFileNotFound
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestCompactionArchiveIOException
       [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.024 s - in org.apache.hadoop.hbase.regionserver.TestCompactionArchiveIOException
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestCompactionLifeCycleTracker
       [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.897 s - in org.apache.hadoop.hbase.regionserver.throttle.TestCompactionWithThroughputController
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestMajorCompaction
       [WARNING] Tests run: 3, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 16.319 s - in org.apache.hadoop.hbase.regionserver.TestCompactionLifeCycleTracker
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestDateTieredCompactionPolicy
       [INFO] Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.984 s - in org.apache.hadoop.hbase.regionserver.TestDateTieredCompactionPolicy
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestCompactionInDeadRegionServer
       [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.748 s - in org.apache.hadoop.hbase.regionserver.TestCompactionInDeadRegionServer
       [INFO] Running org.apache.hadoop.hbase.regionserver.querymatcher.TestCompactionScanQueryMatcher
       [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.896 s - in org.apache.hadoop.hbase.regionserver.querymatcher.TestCompactionScanQueryMatcher
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestCompaction
       [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 199.59 s - in org.apache.hadoop.hbase.regionserver.TestCompactionWithCoprocessor
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestDateTieredCompactionPolicyHeterogeneousStorage
       [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.228 s - in org.apache.hadoop.hbase.regionserver.TestDateTieredCompactionPolicyHeterogeneousStorage
       [INFO] Running org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy
       [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 219.608 s - in org.apache.hadoop.hbase.regionserver.TestCompactionState
       [INFO] Running org.apache.hadoop.hbase.regionserver.compactions.TestStripeCompactionPolicy
       [INFO] Tests run: 36, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.35 s - in org.apache.hadoop.hbase.regionserver.compactions.TestStripeCompactionPolicy
       [INFO] Running org.apache.hadoop.hbase.regionserver.TestCompactionAfterBulkLoad
       [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 24.276 s - in org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy
       [INFO] Running org.apache.hadoop.hbase.master.region.TestMasterRegionCompaction
       [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.307 s - in org.apache.hadoop.hbase.regionserver.TestCompactionAfterBulkLoad
       [INFO] Running org.apache.hadoop.hbase.rsgroup.TestRSGroupMajorCompactionTTL
       [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.496 s - in org.apache.hadoop.hbase.master.region.TestMasterRegionCompaction
       [INFO] Running org.apache.hadoop.hbase.mob.TestMobStoreCompaction
       [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.738 s - in org.apache.hadoop.hbase.mob.TestMobStoreCompaction
       [INFO] Running org.apache.hadoop.hbase.mob.TestMobCompactionWithDefaults
       [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 208.078 s - in org.apache.hadoop.hbase.regionserver.TestCompaction
       [INFO] Running org.apache.hadoop.hbase.mob.TestMobCompactionRegularRegionBatchMode
       [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 115.164 s - in org.apache.hadoop.hbase.rsgroup.TestRSGroupMajorCompactionTTL
       [INFO] Running org.apache.hadoop.hbase.mob.TestMobCompactionOptRegionBatchMode
       [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 187.897 s - in org.apache.hadoop.hbase.mob.TestMobCompactionWithDefaults
       [INFO] Running org.apache.hadoop.hbase.mob.TestMobCompactionOptMode
       [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 179.166 s - in org.apache.hadoop.hbase.mob.TestMobCompactionOptMode
       [INFO] Running org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreadedWithBasicCompaction
       [WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.022 s - in org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreadedWithBasicCompaction
       [INFO] Running org.apache.hadoop.hbase.client.TestMobRestoreSnapshotFromClientGetCompactionState
       [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 290.012 s - in org.apache.hadoop.hbase.mob.TestMobCompactionOptRegionBatchMode
       [INFO] Running org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreadedWithEagerCompaction
       [WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.026 s - in org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreadedWithEagerCompaction
       [INFO] Running org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientGetCompactionState
       [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.904 s - in org.apache.hadoop.hbase.client.TestMobRestoreSnapshotFromClientGetCompactionState
       [INFO] Running org.apache.hadoop.hbase.TestAcidGuaranteesWithNoInMemCompaction
       [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 321.549 s - in org.apache.hadoop.hbase.mob.TestMobCompactionRegularRegionBatchMode
       [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.118 s - in org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientGetCompactionState
       [INFO] Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 688.559 s - in org.apache.hadoop.hbase.regionserver.TestMajorCompaction
       [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 201.193 s - in org.apache.hadoop.hbase.TestAcidGuaranteesWithNoInMemCompaction
       [INFO] Results:
       [WARNING] Tests run: 181, Failures: 0, Errors: 0, Skipped: 3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848580369


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   We can leave it in the API, but the current implementation only set the 'writer' field to null, and so the method body becomes empty when that field is converted into a parameter, so I removed it. 
   
   If we are going to keep it, we somehow need to pass a key as a parameter to abort and remove a StoreFileWriter instance from the new Set that holds them while the compactions are in progress. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1096916904

   Pushed updates responding to review feedback. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache9 commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache9 commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1099396712

   Please see if this POC could make things better...
   
   I've introduced a a StoreFileWriterCreationTracker(actually in most places it is just a Consumer<Path>), for recording the creation of a StoreFileWriter, and this time we could also track the written files for flush. The trackers are maintained in HStore, so in compactor we do not need to store them any more.
   
   https://github.com/Apache9/hbase/commit/79fa3a9d72aade0bfe490b301f909b3d5722de06


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache9 commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache9 commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098232878

   Maybe we do not call it resetCompactionWriter? We just call it cleanupCompaction or something else, which indicate that the compaction is finally finished, and the compactor should release all the related resources of this compaction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848580369


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   We can leave it in the API, but the current implementation only set the `writer` field to null, and so the method bodies become empty after that field is converted into a parameter and they no longer have anything to do, so I removed it. 
   
   If we are going to keep it, we need to use a `Map` instead of `Set` to track the writers, and somehow need to pass a key as a parameter to abort and remove a `StoreFileWriter` instance. What should be the key? The `CompactionRequestImpl`? I do not think there is a requirement to abort compaction writers in this way. We abort compactions today by interrupting the thread. 
   
   If `BrokenStoreFileCleanerChore` will not function correctly without this, then it will need modification. 
   
   I think `BrokenStoreFileCleanerChore` sees the same results from `getCompactionTargets` after these changes. When the compaction is finished the `StoreFileWriter` will be removed from the set in the `finally` block of `compact`, so `getCompactionTargets` will not include the files being written by that writer after that point, which is the same thing that happened when `resetCompactionWriter` would cause the `writer` field in the previous impl to become null, and also the files being written by that writer would no longer appear in `getCompactionTargets` results afterward. But the timing has changed, that is true. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848618359


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   Ah actually there is even a problem here in the current code, let me fix it...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848580369


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   We can leave it in the API, but the current implementation only set the 'writer' field to null, and so the method body becomes empty when that field is converted into a parameter, so I removed it. 
   
   If we are going to keep it, we need to use a Map instead of Set to track the writers, and somehow need to pass a key as a parameter to abort and remove a StoreFileWriter instance. What should be the key? The CompactionRequestImpl? I do not think there is a requirement to abort compaction writers in this way. We abort compactions today by interrupting the thread. 
   
   If BrokenStoreFileCleanerChore will not function correctly without this, then it will need modification. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098294825

   @joshelser We have the other approach to fall back on but I think we can make the current code work with some polish. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] joshelser commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
joshelser commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r850866084


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -550,22 +551,63 @@ protected InternalScanner createScanner(HStore store, ScanInfo scanInfo,
         dropDeletesFromRow, dropDeletesToRow);
   }
 
-  public List<Path> getCompactionTargets() {
-    T writer = this.writer;
-    if (writer == null) {
-      return Collections.emptyList();
+  /**
+   * Return the progress for a given compaction request.
+   * @param request the compaction request
+   */
+  public CompactionProgress getProgress(CompactionRequestImpl request) {
+    return progressMap.get(request);
+  }
+
+  /**
+   * Return the aggregate progress for all currently active compactions.
+   */
+  public CompactionProgress getProgress() {
+    synchronized (progressMap) {
+      long totalCompactingKVs = 0;
+      long currentCompactedKVs = 0;
+      long totalCompactedSize = 0;
+      for (CompactionProgress progress: progressMap.values()) {
+        totalCompactingKVs += progress.totalCompactingKVs;
+        currentCompactedKVs += progress.currentCompactedKVs;
+        totalCompactedSize += progress.totalCompactedSize;
+      }
+      CompactionProgress result = new CompactionProgress(totalCompactingKVs);
+      result.currentCompactedKVs = currentCompactedKVs;
+      result.totalCompactedSize = totalCompactedSize;
+      return result;
     }
-    if (writer instanceof StoreFileWriter) {
-      return Arrays.asList(((StoreFileWriter) writer).getPath());
+  }
+
+  public boolean isCompacting() {
+    return !progressMap.isEmpty();
+  }
+
+  /**
+   * Return the list of target files for all currently active compactions.
+   */
+  public List<Path> getCompactionTargets() {
+    // Build a list of all the compaction targets for all active writers
+    List<Path> targets = new ArrayList<>();
+    synchronized (writerMap) {
+      for (T writer: writerMap.values()) {
+        if (writer instanceof StoreFileWriter) {
+          targets.add(((StoreFileWriter) writer).getPath());
+        } else {
+          ((AbstractMultiFileWriter) writer).writers().stream()
+            .forEach(sfw -> targets.add(sfw.getPath()));
+        }
+      }
     }
-    return ((AbstractMultiFileWriter) writer).writers().stream().map(sfw -> sfw.getPath())
-      .collect(Collectors.toList());
+    return targets;
   }
 
   /**
-   * Reset the Writer when the new storefiles were successfully added
+   * Complete the compaction after the new storefiles are successfully added.
    */
-  public void resetWriter(){
-    writer = null;
+  public void completeCompaction(CompactionRequestImpl request) {
+    writerMap.remove(request);
+    progressMap.remove(request);

Review Comment:
   I'm assuming that you're generally OK having these two synchronized data structures updated not under the same lock? Given the current use of `writerMap` and `progressMap`, nothing is jumping out at me as bad as you currently have it. Just thinking out loud.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -550,22 +551,63 @@ protected InternalScanner createScanner(HStore store, ScanInfo scanInfo,
         dropDeletesFromRow, dropDeletesToRow);
   }
 
-  public List<Path> getCompactionTargets() {
-    T writer = this.writer;
-    if (writer == null) {
-      return Collections.emptyList();
+  /**
+   * Return the progress for a given compaction request.
+   * @param request the compaction request
+   */
+  public CompactionProgress getProgress(CompactionRequestImpl request) {
+    return progressMap.get(request);
+  }
+
+  /**
+   * Return the aggregate progress for all currently active compactions.
+   */
+  public CompactionProgress getProgress() {
+    synchronized (progressMap) {
+      long totalCompactingKVs = 0;
+      long currentCompactedKVs = 0;
+      long totalCompactedSize = 0;
+      for (CompactionProgress progress: progressMap.values()) {
+        totalCompactingKVs += progress.totalCompactingKVs;
+        currentCompactedKVs += progress.currentCompactedKVs;
+        totalCompactedSize += progress.totalCompactedSize;
+      }
+      CompactionProgress result = new CompactionProgress(totalCompactingKVs);
+      result.currentCompactedKVs = currentCompactedKVs;
+      result.totalCompactedSize = totalCompactedSize;
+      return result;
     }
-    if (writer instanceof StoreFileWriter) {
-      return Arrays.asList(((StoreFileWriter) writer).getPath());
+  }
+
+  public boolean isCompacting() {
+    return !progressMap.isEmpty();
+  }
+
+  /**
+   * Return the list of target files for all currently active compactions.
+   */
+  public List<Path> getCompactionTargets() {
+    // Build a list of all the compaction targets for all active writers
+    List<Path> targets = new ArrayList<>();
+    synchronized (writerMap) {
+      for (T writer: writerMap.values()) {
+        if (writer instanceof StoreFileWriter) {
+          targets.add(((StoreFileWriter) writer).getPath());
+        } else {
+          ((AbstractMultiFileWriter) writer).writers().stream()

Review Comment:
   Suggest: make this an `else if (writer instance of AbstractMultiFileWriter)` to be defensive and have the `else` branch to fail loudly if something goes wrong.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -344,6 +340,9 @@ protected final List<Path> compact(final CompactionRequestImpl request,
     boolean finished = false;
     List<StoreFileScanner> scanners =
       createFileScanners(request.getFiles(), smallestReadPoint, dropCache);
+    T writer = null;
+    CompactionProgress progress = new CompactionProgress(fd.maxKeyCount);
+    progressMap.put(request, progress);

Review Comment:
   Could warn if the return value from `put(...)` was anything but `null` (i.e. we should not have a progress for this `CompactionRequestImpl`)



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -550,22 +551,63 @@ protected InternalScanner createScanner(HStore store, ScanInfo scanInfo,
         dropDeletesFromRow, dropDeletesToRow);
   }
 
-  public List<Path> getCompactionTargets() {
-    T writer = this.writer;
-    if (writer == null) {
-      return Collections.emptyList();
+  /**
+   * Return the progress for a given compaction request.
+   * @param request the compaction request
+   */
+  public CompactionProgress getProgress(CompactionRequestImpl request) {
+    return progressMap.get(request);
+  }
+
+  /**
+   * Return the aggregate progress for all currently active compactions.
+   */
+  public CompactionProgress getProgress() {
+    synchronized (progressMap) {
+      long totalCompactingKVs = 0;
+      long currentCompactedKVs = 0;
+      long totalCompactedSize = 0;
+      for (CompactionProgress progress: progressMap.values()) {
+        totalCompactingKVs += progress.totalCompactingKVs;
+        currentCompactedKVs += progress.currentCompactedKVs;
+        totalCompactedSize += progress.totalCompactedSize;
+      }
+      CompactionProgress result = new CompactionProgress(totalCompactingKVs);
+      result.currentCompactedKVs = currentCompactedKVs;
+      result.totalCompactedSize = totalCompactedSize;
+      return result;
     }
-    if (writer instanceof StoreFileWriter) {
-      return Arrays.asList(((StoreFileWriter) writer).getPath());
+  }
+
+  public boolean isCompacting() {
+    return !progressMap.isEmpty();
+  }
+
+  /**
+   * Return the list of target files for all currently active compactions.
+   */
+  public List<Path> getCompactionTargets() {
+    // Build a list of all the compaction targets for all active writers
+    List<Path> targets = new ArrayList<>();
+    synchronized (writerMap) {
+      for (T writer: writerMap.values()) {
+        if (writer instanceof StoreFileWriter) {
+          targets.add(((StoreFileWriter) writer).getPath());
+        } else {
+          ((AbstractMultiFileWriter) writer).writers().stream()
+            .forEach(sfw -> targets.add(sfw.getPath()));
+        }
+      }
     }
-    return ((AbstractMultiFileWriter) writer).writers().stream().map(sfw -> sfw.getPath())
-      .collect(Collectors.toList());
+    return targets;
   }
 
   /**
-   * Reset the Writer when the new storefiles were successfully added
+   * Complete the compaction after the new storefiles are successfully added.
    */
-  public void resetWriter(){
-    writer = null;
+  public void completeCompaction(CompactionRequestImpl request) {
+    writerMap.remove(request);
+    progressMap.remove(request);

Review Comment:
   Do we care if either of the `remove()` calls returns `null` (i.e. the map doesn't have the mapping in it)?



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -381,34 +376,40 @@ protected final List<Path> compact(final CompactionRequestImpl request,
       } else {
         Closeables.close(scanner, true);
       }
-      if (!finished && writer != null) {
-        abortWriter();
+      if (!finished) {
+        if (writer != null) {
+          abortWriter(writer);
+        }
+        // This signals that the target file is no longer written and can be cleaned up
+        completeCompaction(request);

Review Comment:
   Clarifying: we only need to call completeCompaction in the exceptional case (compaction didn't finish normally)? And `commitWriter()` down below is doing the same kind of cleanup work that `completeCompaction()` here is doing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1099699638

   I don't see how we can collaborate here when pursuing two different solutions @Apache9 . I am happy to cut my losses and close this and review your solution instead. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098670366

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 20s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  4s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 35s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   3m 41s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 35s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 45s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   3m 38s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 206m 49s |  hbase-server in the patch passed.  |
   |  |   | 225m 17s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4338 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 8a34d59bfcd2 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f990f56e8e |
   | Default Java | AdoptOpenJDK-11.0.10+9 |
   |  Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/3/testReport/ |
   | Max. process+thread count | 2669 (vs. ulimit of 30000) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/3/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098223100

   > So I suggest that, we add something like a compactionId to the CompactionRequest interface, and use it as the key for our map, when calling StoreEngine.replaceStoreFile, we pass this id in, then we could use this to remove the writer in the Compactor.
   
   I think it's messy and leaks abstraction. 
   We are not going to remove the writer in Compactor. Compactor will create the writer and leave it in the map. External code will call some new API to remove the writer only after the appropriate SFT methods have been called. So this internal detail of Compactor leaks out to all of the users.
   
   Anyway my other idea wouldn't work because of resetCompactionWriter in StoreFileEngine, which assumes that Compactor is a singleton, even though the compaction changes assume it is per compaction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098228279

   @Apache9 I have been testing with my original workaround for what it's worth, https://github.com/apache/hbase/pull/4334 . This does allow concurrent compaction against a given store, respecting that Compactor is not thread safe for now. It works well. As an option to unblock us we could use it for now and come back to the implementation issues with SFT and compaction in a follow up issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848580369


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   We can leave it in the API, but the current implementation only set the `writer` field to null, and so the method bodies become empty after that field is converted into a parameter and they no longer have anything to do, so I removed it. 
   
   If we are going to keep it, we need to use a `Map` instead of `Set` to track the writers, and somehow need to pass a key as a parameter to abort and remove a `StoreFileWriter` instance. What should be the key? The `CompactionRequestImpl`? I do not think there is a requirement to abort compaction writers in this way. We abort compactions today by interrupting the thread. 
   
   If `BrokenStoreFileCleanerChore` will not function correctly without this, then it will need modification. 
   
   I think `BrokenStoreFileCleanerChore` sees the same results from `getCompactionTargets` after these changes. When the compaction is finished the `StoreFileWriter` will be removed from the set in the `finally` block of `compact`, so `getCompactionTargets` will not include the files being written by that writer after that point, which is the same thing that happened when `resetCompactionWriter` would cause the `writer` field in the previous impl to become null, and also the files being written by that writer would no longer appear in `getCompactionTargets` results afterward. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848600124


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -381,34 +377,46 @@ protected final List<Path> compact(final CompactionRequestImpl request,
       } else {
         Closeables.close(scanner, true);
       }
-      if (!finished && writer != null) {
-        abortWriter();
+      if (progress != null) {
+        synchronized (this.progress) {
+          this.progress.remove(progress);
+        }
+      }
+      if (writer != null) {

Review Comment:
   Yes, but see the asserts below:
   
       assert finished : "We should have exited the method on all error paths";
       assert writer != null : "Writer should be non-null if no error";
   
   These were in place even without a `return` here in the previous code. I can add it. It does seem it was (and is) incorrect.
   
   When we drop through to the end of this method there is a finished `writer` we can commit, or so this claims. 
   
   I will also convert those assertions to throws of `IllegalStateException` so we can see if these invariants are not holding even when assertions are not enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848628626


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   Fixed, at least now we do not remove the writer from the set until after commit. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1099700342

   Let @Apache9 propose a PR for his solution. 
   No hard feelings. Need to conserve my time. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell closed pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell closed pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration
URL: https://github.com/apache/hbase/pull/4338


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache9 commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache9 commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1099736811

   > Let @Apache9 propose a PR for his solution that moves the scope of where this state is tracked, makes sense to me. No hard feelings. Need to conserve my time. We shouldn't be pursuing two different solutions. Let's just review the one that is cleaner, I think both Duo's proposal and Josh's comment expressed that preference.
   
   Oh, sorry, I didn't mean to push my solution. Why I implement a POC is that seems we all think the current approach in this PR is a bit messy as we can not clean up the state with the 'try..finally' style, so I just give us an impression that what the code will look like if we can do the cleanup with the 'try...finally' style.
   
   In fact I think the current solution in this PR is better enough to solve the problem for now and all related classes are IA.Private, so we are free to apply this PR first and then change to other approachs in the future.
   
   So I'm neutrality on these two approach, especially if we want to fix the problem ASAP. Then we will have plenty of time to discuss what is a better solution, as it will not block the 2.5.0 release.
   
   Just my thpughts, sorry if I didn't say this clearly before posting the POC. Anyway, if you guys think the approach in POC is generally good, let me convert it to a PR so you can better review it.
   
   Thanks~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848629185


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -381,34 +377,46 @@ protected final List<Path> compact(final CompactionRequestImpl request,
       } else {
         Closeables.close(scanner, true);
       }
-      if (!finished && writer != null) {
-        abortWriter();
+      if (progress != null) {
+        synchronized (this.progress) {
+          this.progress.remove(progress);
+        }
+      }
+      if (writer != null) {

Review Comment:
   I removed these assertions because the latest edits have changed the code flow here.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848600124


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -381,34 +377,46 @@ protected final List<Path> compact(final CompactionRequestImpl request,
       } else {
         Closeables.close(scanner, true);
       }
-      if (!finished && writer != null) {
-        abortWriter();
+      if (progress != null) {
+        synchronized (this.progress) {
+          this.progress.remove(progress);
+        }
+      }
+      if (writer != null) {

Review Comment:
   Yes, but see the asserts below:
   
       assert finished : "We should have exited the method on all error paths";
       assert writer != null : "Writer should be non-null if no error";
   
   These were in place even without a `return` here in the previous code. I can add it, though. 
   
   When we drop through to the end of this method there is a finished `writer` we can commit, or so this claims. I will convert those assertions to throws of `IllegalStateException` so we can see if these invariants are not holding even when assertions are not enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache9 commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache9 commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848960055


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -381,34 +376,50 @@ protected final List<Path> compact(final CompactionRequestImpl request,
       } else {
         Closeables.close(scanner, true);
       }
-      if (!finished && writer != null) {
-        abortWriter();
+      if (progress != null) {
+        this.progress.remove(progress);
+      }
+      if (writer != null) {
+        try {
+          if (!finished) {
+            // There was an error, so abort the writer.
+            abortWriter(writer);
+            return Collections.emptyList();
+          } else {
+            // Finished, commit the writer's results.
+            return commitWriter(writer, fd, request);

Review Comment:
   The commitWriter here just means append metadata and close the writer, it does not mean to record the files in SFT...
   
   We still need to add something after the replaceStoreFile call to remove the writer...
   
   So maybe you are right, we need to use a Map instead of Set, the key can be the CompactionRequestImpl?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache9 commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache9 commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098637886

   I think the key problem here is the compactionTargets is used during the whole compaction life time, but the state is only stored in compactor, which is only used for generating the new store files, so no matter what we do, it is still very awkward.
   
   I got another idea in mind that, maybe we could just pass a StoreFileWriterTargetCollector to the compactor, the compactor implementation will add new target to the collector when creating new store file writer. And the collector is stored in StoreEngine or HStore, so we are free to clean it up at any time. The compactor does not need to do the cleanup work any more. I could implement a POC to see if it works. In general, I think we should also track the target of flushed store files as well, but this is a missing part in the original patch of HBASE-26271.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848580369


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   We can leave it in the API, but the current implementation only set the 'writer' field to null, and so the method bodies become empty after that field is converted into a parameter and they no longer have anything to do, so I removed it. 
   
   If we are going to keep it, we need to use a Map instead of Set to track the writers, and somehow need to pass a key as a parameter to abort and remove a StoreFileWriter instance. What should be the key? The CompactionRequestImpl? I do not think there is a requirement to abort compaction writers in this way. We abort compactions today by interrupting the thread. 
   
   If BrokenStoreFileCleanerChore will not function correctly without this, then it will need modification. 
   
   I think BrokenStoreFileCleanerChore sees the same results without this after these changes. When the compaction is finished the StoreFileWriter is removed from the set in the `finally` block, so getCompactionTargets will not include the files being written by that writer, which is the same thing that happened when resetCompactionWriter() would cause the field to become null so the files being written by that writer would no longer appear in getCompactionTargets results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098293904

   Actually the keyed maps approach solves another problem I found. Posting an update soon. Making sure compaction unit tests all pass first. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1095898645

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 48s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 41s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 17s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 18s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 10s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 13s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 13s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 34s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  12m  2s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 10s |  The patch does not generate ASF License warnings.  |
   |  |   |  31m  9s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/1/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4338 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile |
   | uname | Linux 73b794091bc1 5.4.0-1043-aws #45~18.04.1-Ubuntu SMP Fri Apr 9 23:32:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 1d8a5bf0e3 |
   | Default Java | AdoptOpenJDK-1.8.0_282-b08 |
   | Max. process+thread count | 69 (vs. ulimit of 30000) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/1/console |
   | versions | git=2.17.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848600124


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -381,34 +377,46 @@ protected final List<Path> compact(final CompactionRequestImpl request,
       } else {
         Closeables.close(scanner, true);
       }
-      if (!finished && writer != null) {
-        abortWriter();
+      if (progress != null) {
+        synchronized (this.progress) {
+          this.progress.remove(progress);
+        }
+      }
+      if (writer != null) {

Review Comment:
   Yes, but see the asserts below:
   
       assert finished : "We should have exited the method on all error paths";
       assert writer != null : "Writer should be non-null if no error";
   
   When we drop through to the end of this method there is a finished `writer` we can commit. I will convert those assertions to throws of `IllegalStateException` so we can see if these invariants are not holding even when assertions are not enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache9 commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache9 commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098233935

   > @Apache9 I have been testing with my original workaround for what it's worth, #4334 . That change does not allow concurrent compaction against a given store, respecting that Compactor is not thread safe for now. It works well. The performance of the test scenario is unchanged from baseline without any SFT changes. As an option to unblock us we could use it for now and come back to the implementation issues with SFT and compaction in a follow up issue.
   
   I do not think this is a good way to solve the problem, we do allow concurrent compactions happen at the same time in the past...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098234941

   > I do not think this is a good way to solve the problem, we do allow concurrent compactions happen at the same time in the past...
   
   I agree. It would be to unblock us to give more time to think about SFT design around compaction, not a permanent solution. Anyway I will update this PR as promised soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1097199903

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 16s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  4s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 43s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   3m 40s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 35s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 47s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 47s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   3m 39s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 238m  8s |  hbase-server in the patch failed.  |
   |  |   | 256m 57s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4338 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 5e1523f7f52c 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / ea9bc92ce2 |
   | Default Java | AdoptOpenJDK-11.0.10+9 |
   | unit | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt |
   |  Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/2/testReport/ |
   | Max. process+thread count | 2604 (vs. ulimit of 30000) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/2/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1097199341

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 29s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  2s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m  0s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   4m 44s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 59s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m  7s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 233m 37s |  hbase-server in the patch failed.  |
   |  |   | 256m 10s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4338 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux c5fa78127f30 5.4.0-1071-aws #76~18.04.1-Ubuntu SMP Mon Mar 28 17:49:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / ea9bc92ce2 |
   | Default Java | AdoptOpenJDK-1.8.0_282-b08 |
   | unit | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/2/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt |
   |  Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/2/testReport/ |
   | Max. process+thread count | 2638 (vs. ulimit of 30000) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/2/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache9 commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache9 commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848001281


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -93,13 +94,11 @@
   protected static final String MINOR_COMPACTION_DROP_CACHE =
       "hbase.regionserver.minorcompaction.pagecache.drop";
 
-  private final boolean dropCacheMajor;
-  private final boolean dropCacheMinor;
+  protected final boolean dropCacheMajor;
+  protected final boolean dropCacheMinor;
 
-  // In compaction process only a single thread will access and write to this field, and
-  // getCompactionTargets is the only place we will access it other than the compaction thread, so
-  // make it volatile.
-  protected volatile T writer = null;
+  protected final Set<T> writers = new HashSet<>();

Review Comment:
   writers = Collections.synchronizedSet(Collections.newSetFromMap(new IdentityHashMap<>()));
   
   I think IdentityHashMap is safer as we do not know whether the implementation class for CellSink will implement equals or not. And also use synchronizedSet so we do not need to manually synchronize everywhere, the only place we need to manually synchronize is when we want to iterator over the it.
   
   And the same to the CompactionProgress field.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   This is not needed any more?



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -381,34 +377,46 @@ protected final List<Path> compact(final CompactionRequestImpl request,
       } else {
         Closeables.close(scanner, true);
       }
-      if (!finished && writer != null) {
-        abortWriter();
+      if (progress != null) {
+        synchronized (this.progress) {
+          this.progress.remove(progress);
+        }
+      }
+      if (writer != null) {

Review Comment:
   We do not need to return here?
   
   Strange...



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -550,22 +558,35 @@ protected InternalScanner createScanner(HStore store, ScanInfo scanInfo,
         dropDeletesFromRow, dropDeletesToRow);
   }
 
-  public List<Path> getCompactionTargets() {
-    T writer = this.writer;
-    if (writer == null) {
-      return Collections.emptyList();
-    }
-    if (writer instanceof StoreFileWriter) {
-      return Arrays.asList(((StoreFileWriter) writer).getPath());
+  public CompactionProgress getProgress() {
+    long currentCompactedKvs = 0;
+    long totalCompactingKVs = 0;
+    long totalCompactedSize = 0;
+    synchronized (progress) {
+      for (CompactionProgress p: progress) {
+        currentCompactedKvs += p.getCurrentCompactedKvs();
+        totalCompactingKVs += p.getTotalCompactingKVs();
+        totalCompactedSize += p.getTotalCompactedSize();
+      }
     }
-    return ((AbstractMultiFileWriter) writer).writers().stream().map(sfw -> sfw.getPath())
-      .collect(Collectors.toList());
+    return new CompactionProgress(totalCompactingKVs)
+      .setCurrentCompactedKvs(currentCompactedKvs)
+      .setTotalCompactedSize(totalCompactedSize);
   }
 
-  /**
-   * Reset the Writer when the new storefiles were successfully added
-   */
-  public void resetWriter(){
-    writer = null;
+  public List<Path> getCompactionTargets() {
+    // Build a list of all the compaction targets for all active writers
+    List<Path> targets = new ArrayList<>();
+    synchronized (writers) {
+      for (T writer: writers) {
+        if (writer instanceof StoreFileWriter) {
+          targets.add(((StoreFileWriter) writer).getPath());
+        }
+        ((AbstractMultiFileWriter) writer).writers().stream()

Review Comment:
   else



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1096984018

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 28s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 10s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 43s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 41s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 37s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 13s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 50s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 50s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 38s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  16m 51s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.  |
   | +1 :green_heart: |  spotbugs  |   1m 59s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 12s |  The patch does not generate ASF License warnings.  |
   |  |   |  41m 22s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/2/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4338 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile |
   | uname | Linux 8b1fd70052cf 5.4.0-96-generic #109-Ubuntu SMP Wed Jan 12 16:49:16 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / ea9bc92ce2 |
   | Default Java | AdoptOpenJDK-1.8.0_282-b08 |
   | Max. process+thread count | 70 (vs. ulimit of 30000) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/2/console |
   | versions | git=2.17.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848580369


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   We can leave it in the API, but the current implementation only set the 'writer' field to null. The method body becomes empty when that field is converted into a parameter. 
   
   If we are going to keep it, we somehow need to pass a key as a parameter to abort and remove a StoreFileWriter instance from the new Set that holds them while the compactions are in progress. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache9 commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache9 commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848598347


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   Logically we need this to keep correctness. IIRC, the problem here is that, we can only cleanup the writer instance after we successfully commit the store files to SFT, i.e, after the replaceStoreFile method. That's why we can not just simply remove the writer instance in commitWriter, otherwise there could be data loss, i.e, the BrokenStoreFileCleanerChore may delete the store files which are written just now but have not been added to the SFT yet...
   
   Let me check again if the new implementation can solve the problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098660374

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 39s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  2s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 46s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   3m 52s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m  6s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 34s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   3m 50s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 21s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 183m 11s |  hbase-server in the patch passed.  |
   |  |   | 201m  9s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4338 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux dd6216435bbc 5.4.0-1071-aws #76~18.04.1-Ubuntu SMP Mon Mar 28 17:49:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f990f56e8e |
   | Default Java | AdoptOpenJDK-1.8.0_282-b08 |
   |  Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/3/testReport/ |
   | Max. process+thread count | 3052 (vs. ulimit of 30000) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4338/3/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] joshelser commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
joshelser commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098257270

   > That change does not allow concurrent compaction against a given store, respecting that Compactor is not thread safe for now. It works well. The performance of the test scenario is unchanged from baseline without any SFT changes. As an option to unblock us we could use it for now and come back to the implementation issues with SFT and compaction in a follow up issue.
   
   I agree with your feelings, Andrew. Correctness first and then optimization.
   
   > we do allow concurrent compactions happen at the same time in the past...
   
   Acknowledging this too: yes, we should be able to compact two distinct subsets of the files in a store concurrently. And, I could see value in doing so (e.g. compacting three larger files in a store and also wanting to compact a few smaller hfiles created from memstore flushes -- we don't want to wait for that big compaction to finish to run the smaller one).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r850876906


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -344,6 +340,9 @@ protected final List<Path> compact(final CompactionRequestImpl request,
     boolean finished = false;
     List<StoreFileScanner> scanners =
       createFileScanners(request.getFiles(), smallestReadPoint, dropCache);
+    T writer = null;
+    CompactionProgress progress = new CompactionProgress(fd.maxKeyCount);
+    progressMap.put(request, progress);

Review Comment:
   Right, the null handling was unfinished. Same for your other point as well. But I think we will be going with @Apache9 's alternative. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1099701991

   One thing I will note is I also tried to clean up progress reporting, which will need to be separately addressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache9 commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
Apache9 commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098217770

   > Last set of changes broke tests so I will revert them.
   > 
   > @Apache9 I think we need to go back to the drawing board here. We need to put the call out to SFT logic somewhere else, or we need to make Compactor something that is created per compaction as was the assumption behind the changes that added the 'writer' field and requires the semantics offered by that. I will try the latter.
   
   Creating Compactor per compaction maybe too big? FWIW, we have a Compactor instance in StoreEngine, if we want to make it per compaction, it will cause a very big refactoring.
   
   So I suggest that, we add something like a compactionId to the CompactionRequest interface, and use it as the key for our map, when calling StoreEngine.replaceStoreFile, we pass this id in, then we could use this to remove the writer in the Compactor.
   
   WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098211813

   Last set of changes broke tests so I will revert them.
   
   @Apache9 I think we need to go back to the drawing board here. We need to put the call out to SFT logic somewhere else, or we need to make Compactor something that is created per compaction as was the assumption behind the changes that added the 'writer' field and requires the semantics offered by that. I will try the latter. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1098234030

   > Maybe we do not call it resetCompactionWriter? We just call it cleanupCompaction or something else, which indicate that the compaction is finally finished, and the compactor should release all the related resources of this compaction.
   
   I still think it is messy.
   We have this PR in progress already, so I will make this change, so we can at least look at it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848580369


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   We can leave it in the API, but the current implementation only set the 'writer' field to null, and so the method bodies become empty after that field is converted into a parameter and they no longer have anything to do, so I removed it. 
   
   If we are going to keep it, we need to use a Map instead of Set to track the writers, and somehow need to pass a key as a parameter to abort and remove a StoreFileWriter instance. What should be the key? The CompactionRequestImpl? I do not think there is a requirement to abort compaction writers in this way. We abort compactions today by interrupting the thread. 
   
   If BrokenStoreFileCleanerChore will not function correctly without this, then it will need modification. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848611867


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java:
##########
@@ -1157,11 +1154,6 @@ protected List<HStoreFile> doCompaction(CompactionRequestImpl cr,
     }
     replaceStoreFiles(filesToCompact, sfs, true);
 
-    // This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the
-    // CleanerChore know that compaction is done and the file can be cleaned up if compaction
-    // have failed.
-    storeEngine.resetCompactionWriter();

Review Comment:
   Thanks for that, you know `BrokenStoreFileCleanerChore` best.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] joshelser commented on pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
joshelser commented on PR #4338:
URL: https://github.com/apache/hbase/pull/4338#issuecomment-1097413494

   FWIW, we had a big `ycsb load` running from multiple nodes in parallel against a 5 node cluster with this change that ran happily for 6 hours (eventually failed due to an OOME which I think is just a misconfiguration by us).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848600124


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -381,34 +377,46 @@ protected final List<Path> compact(final CompactionRequestImpl request,
       } else {
         Closeables.close(scanner, true);
       }
-      if (!finished && writer != null) {
-        abortWriter();
+      if (progress != null) {
+        synchronized (this.progress) {
+          this.progress.remove(progress);
+        }
+      }
+      if (writer != null) {

Review Comment:
   Yes, but see the asserts below:
   
       assert finished : "We should have exited the method on all error paths";
       assert writer != null : "Writer should be non-null if no error";
   
   These were in place even without a `return` here. I can add it, though. 
   
   When we drop through to the end of this method there is a finished `writer` we can commit, or so this claims. I will convert those assertions to throws of `IllegalStateException` so we can see if these invariants are not holding even when assertions are not enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] apurtell commented on a diff in pull request #4338: HBASE-26938 Compaction failures after StoreFileTracker integration

Posted by GitBox <gi...@apache.org>.
apurtell commented on code in PR #4338:
URL: https://github.com/apache/hbase/pull/4338#discussion_r848600124


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java:
##########
@@ -381,34 +377,46 @@ protected final List<Path> compact(final CompactionRequestImpl request,
       } else {
         Closeables.close(scanner, true);
       }
-      if (!finished && writer != null) {
-        abortWriter();
+      if (progress != null) {
+        synchronized (this.progress) {
+          this.progress.remove(progress);
+        }
+      }
+      if (writer != null) {

Review Comment:
   Yes, but see the asserts below:
   
       assert finished : "We should have exited the method on all error paths";
       assert writer != null : "Writer should be non-null if no error";
   
   When we drop through to the end of this method there is a finished `writer` we can commit, or so this claims. I will convert those assertions to throws of `IllegalStateException` so we can see if these invariants are not holding even when assertions are not enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org