You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@geode.apache.org by GitBox <gi...@apache.org> on 2020/06/04 13:07:15 UTC

[GitHub] [geode] jujoramos commented on a change in pull request #5099: GEODE-8029: Delete orphaned drf files (2nd try)

jujoramos commented on a change in pull request #5099:
URL: https://github.com/apache/geode/pull/5099#discussion_r435237982



##########
File path: geode-core/src/main/java/org/apache/geode/internal/cache/Oplog.java
##########
@@ -937,14 +937,21 @@ void initAfterRecovery(boolean offline) {
         // this.crf.raf.seek(this.crf.currSize);
       } else if (!offline) {
         // drf exists but crf has been deleted (because it was empty).
-        // I don't think the drf needs to be opened. It is only used during
-        // recovery.
-        // At some point the compacter my identify that it can be deleted.
         this.crf.RAFClosed = true;
         deleteCRF();
+
+        // See GEODE-8029.

Review comment:
       > This change comes into picture after the recovery is completed - "initAfterRecovery".
   The real problem seems to be during recovery. If there is a periodic recovery this could be fine, but if there are large number of deleted records during the first recovery, the issue may still arise...We may need to have some checks during recovery (before reading the drfs) to avoid reading large deleted records. Please correct me if my understanding is wrong.
   
   Your understanding is correct, however, the change are to prevent the issue from happening in the first place. By deleting the unused `drf` files here and avoid storing them on disk when not needed, users won't hit the `IllegalStateException` at all.
   As a side note, in order to hit the problem in the first place, the user would need to have more than `805306401` delete operations within the `opLog` files in one single run, and compaction should have not run at all, which is highly unlikely.
   If you think we should add the count while reading the files instead and, somehow, expand the load factor when needed or something similar, let me know and I'll give it a try. I'm worried, though, about the performance impact this might have while recovering files from disk... the recovery time will suffer during every single startup and, considering that there's only a handful of scenarios on which the actual issue can happen, I believe making sure we delete unused files is a better approach.
   
   > Also, looking at the other part of the code where "setHasDeletes" is called, it looks like it is called before the deleteDRF() - there could be a reason for doing this.
   
   Will change this, thanks for catching it.
   
   > And there is also call to "getOplogSet().removeDrf(this);" This may be needed here...
   
   This is already done within the `deleteDRF()` method.
   
   ---
   
   Please let me know what you think about point 1 (add an extra check while recovering to make sure we are below the load-factor threshold), so I can go ahead and make all the changes at once.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org