You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by GitBox <gi...@apache.org> on 2022/11/29 06:27:10 UTC

[GitHub] [jackrabbit-oak] amit-jain opened a new pull request, #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

amit-jain opened a new pull request, #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771

   - Introduce new metrics NUM_BLOB_REFERENCES, BLOB_REFERENCES_SIZE
   - calculate and push metrics in check consistency when aggregating all the references from all repositories
   - calculate and push metrics on the mark phase and checkConsistencyAfterGc as well
   - update tests to assert values for new metrics


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] sonarcloud[bot] commented on pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
sonarcloud[bot] commented on PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#issuecomment-1333263601

   SonarCloud Quality Gate failed.&nbsp; &nbsp; [![Quality Gate failed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/failed-16px.png 'Quality Gate failed')](https://sonarcloud.io/dashboard?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=CODE_SMELL) [2 Code Smells](https://sonarcloud.io/project/issues?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&resolved=false&types=CODE_SMELL)
   
   [![0.0%](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/0-16px.png '0.0%')](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&metric=new_coverage&view=list) [0.0% Coverage](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&metric=new_coverage&view=list)  
   [![0.0%](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/3-16px.png '0.0%')](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&metric=new_duplicated_lines_density&view=list) [0.0% Duplication](https://sonarcloud.io/component_measures?id=org.apache.jackrabbit%3Ajackrabbit-oak&pullRequest=771&metric=new_duplicated_lines_density&view=list)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] amit-jain commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
amit-jain commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1036694562


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE
 
         // Mark all used references
         iterateNodeTree(fs, false);
-
+        
+        // Get size
+        sizeBlobStoreReferences(fs, stats);
+        
         // Move the marked references file to the data store meta area if applicable
         GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);
 
         LOG.debug("Ending mark phase of the garbage collector");
     }
 
+    private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats)
+        throws IOException {
+        try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) {
+            lineIterator.forEachRemaining(line -> {
+                String id = line.split(DELIM)[0];
+                long length = DataStoreBlobStore.BlobId.of(id).getLength();
+                LOG.info("Blob {} has size {}", id, length);
+
+                stats.getCollector().updateNumBlobReferences(1);
+
+                if (length != -1) {
+                    stats.getCollector().updateSizeBlobReferences(length);

Review Comment:
   renamed to updateBlobReferencesSize, as above since all other methods have update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] fabriziofortino commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
fabriziofortino commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1035640288


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -1076,8 +1119,8 @@ void handleRemoves(GarbageCollectableBlobStore blobStore, File removedIds, File
         }
 
         void checkConsistencyAfterGC(GarbageCollectableBlobStore blobStore, GarbageCollectorFileState fs,
-            OperationStatsCollector consistencyStatsCollector, File root) throws IOException {
-            consistencyStatsCollector.start();
+            GarbageCollectionOperationStats stats, File root) throws IOException {

Review Comment:
   `root` param is never used



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] amit-jain commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
amit-jain commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1035663780


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE
 
         // Mark all used references
         iterateNodeTree(fs, false);
-
+        
+        // Get size
+        sizeBlobStoreReferences(fs, stats);
+        
         // Move the marked references file to the data store meta area if applicable
         GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);
 
         LOG.debug("Ending mark phase of the garbage collector");
     }
 
+    private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats)
+        throws IOException {
+        try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) {
+            lineIterator.forEachRemaining(line -> {
+                String id = line.split(DELIM)[0];

Review Comment:
   This method uses the same file that Mark phase uses where its essential to bail out if an error. Here though not very essential but not sure if a good idea to report inaccurate size. In rare case (as havent seen that kind of corrption in practice) that happens i think its ok to throw an error. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] thomasmueller commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
thomasmueller commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1035637354


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE
 
         // Mark all used references
         iterateNodeTree(fs, false);
-
+        
+        // Get size
+        sizeBlobStoreReferences(fs, stats);
+        
         // Move the marked references file to the data store meta area if applicable
         GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);
 
         LOG.debug("Ending mark phase of the garbage collector");
     }
 
+    private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats)
+        throws IOException {
+        try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) {
+            lineIterator.forEachRemaining(line -> {
+                String id = line.split(DELIM)[0];

Review Comment:
   I'm not sure about exception handling. If the file is corrupt (eg. empty line, corruption length,...), what would happen? It looks like it would throw an exception. Instead, a warning should be logged, and then it should continue (I assume).



##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -689,6 +711,9 @@ public long checkConsistency() throws Exception {
 
             // Mark all used blob references
             iterateNodeTree(fs, true);
+            // Move the marked references file to the data store meta area if applicable
+            String uniqueSuffix = UUID.randomUUID().toString();
+            GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);

Review Comment:
   I'm not sure... it doesn't seem to be related to reporting the size? Not saying it is wrong... Just that I would expect it in a different PR.



##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -703,6 +728,21 @@ public long checkConsistency() throws Exception {
                 // Retrieve all other marked present in the datastore
                 List<DataRecord> refFiles =
                     ((SharedDataStore) blobStore).getAllMetadataRecords(SharedStoreRecordType.REFERENCES.getType());
+
+                // Get all the repositories registered
+                List<DataRecord> repoFiles =
+                    ((SharedDataStore) blobStore).getAllMetadataRecords(SharedStoreRecordType.REPOSITORY.getType());
+                LOG.info("Repositories registered {}", repoFiles);
+
+                // Retrieve repos for which reference files have not been created
+                Set<String> unAvailRepos =
+                    SharedDataStoreUtils.refsNotAvailableFromRepos(repoFiles, refFiles);
+                LOG.info("Repositories with unavailable references {}", unAvailRepos);
+                
+                if (!unAvailRepos.isEmpty()) {
+                    throw new NotAllRepositoryMarkedException("Not all repositories have marked references available");

Review Comment:
   Same as above: it doesn't seem to be related to reporting the size? 



##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -1156,6 +1202,10 @@ class GarbageCollectionOperationStats implements OperationsStatsMBean {
         static final String NUM_BLOBS_DELETED = "NUM_BLOBS_DELETED";
         static final String TOTAL_SIZE_DELETED = "TOTAL_SIZE_DELETED";
         static final String NUM_CANDIDATES = "NUM_CANDIDATES";
+        
+        static final String NUM_BLOB_REFERENCES = "NUM_BLOB_REFERENCES";

Review Comment:
   I would rename the fields...



##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE
 
         // Mark all used references
         iterateNodeTree(fs, false);
-
+        
+        // Get size
+        sizeBlobStoreReferences(fs, stats);
+        
         // Move the marked references file to the data store meta area if applicable
         GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);
 
         LOG.debug("Ending mark phase of the garbage collector");
     }
 
+    private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats)
+        throws IOException {
+        try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) {
+            lineIterator.forEachRemaining(line -> {
+                String id = line.split(DELIM)[0];
+                long length = DataStoreBlobStore.BlobId.of(id).getLength();
+                LOG.info("Blob {} has size {}", id, length);
+
+                stats.getCollector().updateNumBlobReferences(1);
+
+                if (length != -1) {
+                    stats.getCollector().updateSizeBlobReferences(length);

Review Comment:
   what about "addTotalBlobSize"?



##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE
 
         // Mark all used references
         iterateNodeTree(fs, false);
-
+        
+        // Get size
+        sizeBlobStoreReferences(fs, stats);
+        
         // Move the marked references file to the data store meta area if applicable
         GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);
 
         LOG.debug("Ending mark phase of the garbage collector");
     }
 
+    private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats)
+        throws IOException {
+        try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) {
+            lineIterator.forEachRemaining(line -> {
+                String id = line.split(DELIM)[0];
+                long length = DataStoreBlobStore.BlobId.of(id).getLength();
+                LOG.info("Blob {} has size {}", id, length);
+
+                stats.getCollector().updateNumBlobReferences(1);

Review Comment:
   what about "addTotalBlobCount" instead of "updateNumBlobReferences"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] amit-jain commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
amit-jain commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1035666309


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE
 
         // Mark all used references
         iterateNodeTree(fs, false);
-
+        
+        // Get size
+        sizeBlobStoreReferences(fs, stats);
+        
         // Move the marked references file to the data store meta area if applicable
         GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);
 
         LOG.debug("Ending mark phase of the garbage collector");
     }
 
+    private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats)
+        throws IOException {
+        try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) {
+            lineIterator.forEachRemaining(line -> {
+                String id = line.split(DELIM)[0];
+                long length = DataStoreBlobStore.BlobId.of(id).getLength();
+                LOG.info("Blob {} has size {}", id, length);
+
+                stats.getCollector().updateNumBlobReferences(1);
+
+                if (length != -1) {
+                    stats.getCollector().updateSizeBlobReferences(length);

Review Comment:
   similar maybe`addTotalBlobReferencesSize`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] amit-jain commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
amit-jain commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1035665580


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE
 
         // Mark all used references
         iterateNodeTree(fs, false);
-
+        
+        // Get size
+        sizeBlobStoreReferences(fs, stats);
+        
         // Move the marked references file to the data store meta area if applicable
         GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);
 
         LOG.debug("Ending mark phase of the garbage collector");
     }
 
+    private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats)
+        throws IOException {
+        try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) {
+            lineIterator.forEachRemaining(line -> {
+                String id = line.split(DELIM)[0];
+                long length = DataStoreBlobStore.BlobId.of(id).getLength();
+                LOG.info("Blob {} has size {}", id, length);
+
+                stats.getCollector().updateNumBlobReferences(1);

Review Comment:
   actually would have to differentiate between total blob count and total references so maybe `addTotalBlobReferencesCount`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] amit-jain commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
amit-jain commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1035670123


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -689,6 +711,9 @@ public long checkConsistency() throws Exception {
 
             // Mark all used blob references
             iterateNodeTree(fs, true);
+            // Move the marked references file to the data store meta area if applicable
+            String uniqueSuffix = UUID.randomUUID().toString();
+            GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);

Review Comment:
   Yes, the process only looked at the files generated from local, but neither the consistency nor the size is accurate if not accounting all references available (as in sweep). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] amit-jain merged pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
amit-jain merged PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] amit-jain commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
amit-jain commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1035671462


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -1156,6 +1202,10 @@ class GarbageCollectionOperationStats implements OperationsStatsMBean {
         static final String NUM_BLOBS_DELETED = "NUM_BLOBS_DELETED";
         static final String TOTAL_SIZE_DELETED = "TOTAL_SIZE_DELETED";
         static final String NUM_CANDIDATES = "NUM_CANDIDATES";
+        
+        static final String NUM_BLOB_REFERENCES = "NUM_BLOB_REFERENCES";

Review Comment:
   Would rename as per the above approach for methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] amit-jain commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
amit-jain commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1035664240


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -1076,8 +1119,8 @@ void handleRemoves(GarbageCollectableBlobStore blobStore, File removedIds, File
         }
 
         void checkConsistencyAfterGC(GarbageCollectableBlobStore blobStore, GarbageCollectorFileState fs,
-            OperationStatsCollector consistencyStatsCollector, File root) throws IOException {
-            consistencyStatsCollector.start();
+            GarbageCollectionOperationStats stats, File root) throws IOException {

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] amit-jain commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
amit-jain commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1036694328


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE
 
         // Mark all used references
         iterateNodeTree(fs, false);
-
+        
+        // Get size
+        sizeBlobStoreReferences(fs, stats);
+        
         // Move the marked references file to the data store meta area if applicable
         GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);
 
         LOG.debug("Ending mark phase of the garbage collector");
     }
 
+    private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats)
+        throws IOException {
+        try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) {
+            lineIterator.forEachRemaining(line -> {
+                String id = line.split(DELIM)[0];
+                long length = DataStoreBlobStore.BlobId.of(id).getLength();
+                LOG.info("Blob {} has size {}", id, length);
+
+                stats.getCollector().updateNumBlobReferences(1);

Review Comment:
   I pushed a few changes to the naming but left these as I see the earlier ones are also named update.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] amit-jain commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

Posted by GitBox <gi...@apache.org>.
amit-jain commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1036694953


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -1156,6 +1202,10 @@ class GarbageCollectionOperationStats implements OperationsStatsMBean {
         static final String NUM_BLOBS_DELETED = "NUM_BLOBS_DELETED";
         static final String TOTAL_SIZE_DELETED = "TOTAL_SIZE_DELETED";
         static final String NUM_CANDIDATES = "NUM_CANDIDATES";
+        
+        static final String NUM_BLOB_REFERENCES = "NUM_BLOB_REFERENCES";

Review Comment:
   Ok so no update here as these are blob references. Let me know if still think its a problem



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org