You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by GitBox <gi...@apache.org> on 2022/11/30 08:06:55 UTC

[GitHub] [jackrabbit-oak] thomasmueller commented on a diff in pull request #771: OAK-9975: [DSGC] Report cummulative size of referenced blobs during Mark phase

thomasmueller commented on code in PR #771:
URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1035637354


##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE
 
         // Mark all used references
         iterateNodeTree(fs, false);
-
+        
+        // Get size
+        sizeBlobStoreReferences(fs, stats);
+        
         // Move the marked references file to the data store meta area if applicable
         GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);
 
         LOG.debug("Ending mark phase of the garbage collector");
     }
 
+    private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats)
+        throws IOException {
+        try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) {
+            lineIterator.forEachRemaining(line -> {
+                String id = line.split(DELIM)[0];

Review Comment:
   I'm not sure about exception handling. If the file is corrupt (eg. empty line, corruption length,...), what would happen? It looks like it would throw an exception. Instead, a warning should be logged, and then it should continue (I assume).



##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -689,6 +711,9 @@ public long checkConsistency() throws Exception {
 
             // Mark all used blob references
             iterateNodeTree(fs, true);
+            // Move the marked references file to the data store meta area if applicable
+            String uniqueSuffix = UUID.randomUUID().toString();
+            GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);

Review Comment:
   I'm not sure... it doesn't seem to be related to reporting the size? Not saying it is wrong... Just that I would expect it in a different PR.



##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -703,6 +728,21 @@ public long checkConsistency() throws Exception {
                 // Retrieve all other marked present in the datastore
                 List<DataRecord> refFiles =
                     ((SharedDataStore) blobStore).getAllMetadataRecords(SharedStoreRecordType.REFERENCES.getType());
+
+                // Get all the repositories registered
+                List<DataRecord> repoFiles =
+                    ((SharedDataStore) blobStore).getAllMetadataRecords(SharedStoreRecordType.REPOSITORY.getType());
+                LOG.info("Repositories registered {}", repoFiles);
+
+                // Retrieve repos for which reference files have not been created
+                Set<String> unAvailRepos =
+                    SharedDataStoreUtils.refsNotAvailableFromRepos(repoFiles, refFiles);
+                LOG.info("Repositories with unavailable references {}", unAvailRepos);
+                
+                if (!unAvailRepos.isEmpty()) {
+                    throw new NotAllRepositoryMarkedException("Not all repositories have marked references available");

Review Comment:
   Same as above: it doesn't seem to be related to reporting the size? 



##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -1156,6 +1202,10 @@ class GarbageCollectionOperationStats implements OperationsStatsMBean {
         static final String NUM_BLOBS_DELETED = "NUM_BLOBS_DELETED";
         static final String TOTAL_SIZE_DELETED = "TOTAL_SIZE_DELETED";
         static final String NUM_CANDIDATES = "NUM_CANDIDATES";
+        
+        static final String NUM_BLOB_REFERENCES = "NUM_BLOB_REFERENCES";

Review Comment:
   I would rename the fields...



##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE
 
         // Mark all used references
         iterateNodeTree(fs, false);
-
+        
+        // Get size
+        sizeBlobStoreReferences(fs, stats);
+        
         // Move the marked references file to the data store meta area if applicable
         GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);
 
         LOG.debug("Ending mark phase of the garbage collector");
     }
 
+    private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats)
+        throws IOException {
+        try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) {
+            lineIterator.forEachRemaining(line -> {
+                String id = line.split(DELIM)[0];
+                long length = DataStoreBlobStore.BlobId.of(id).getLength();
+                LOG.info("Blob {} has size {}", id, length);
+
+                stats.getCollector().updateNumBlobReferences(1);
+
+                if (length != -1) {
+                    stats.getCollector().updateSizeBlobReferences(length);

Review Comment:
   what about "addTotalBlobSize"?



##########
oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java:
##########
@@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE
 
         // Mark all used references
         iterateNodeTree(fs, false);
-
+        
+        // Get size
+        sizeBlobStoreReferences(fs, stats);
+        
         // Move the marked references file to the data store meta area if applicable
         GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix);
 
         LOG.debug("Ending mark phase of the garbage collector");
     }
 
+    private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats)
+        throws IOException {
+        try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) {
+            lineIterator.forEachRemaining(line -> {
+                String id = line.split(DELIM)[0];
+                long length = DataStoreBlobStore.BlobId.of(id).getLength();
+                LOG.info("Blob {} has size {}", id, length);
+
+                stats.getCollector().updateNumBlobReferences(1);

Review Comment:
   what about "addTotalBlobCount" instead of "updateNumBlobReferences"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org