You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhaomin1423 (via GitHub)" <gi...@apache.org> on 2023/10/24 04:20:43 UTC

[PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

zhaomin1423 opened a new pull request, #43502:
URL: https://github.com/apache/spark/pull/43502

   
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   use java.lang.ref.Cleaner instead of finalize() for RocksDBIterator
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   The finalize() method has been marked as deprecated since Java 9 and will be removed in the future, java.lang.ref.Cleaner is the more recommended solution.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   Pass actions
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1396673560


##########
.github/workflows/build_and_test.yml:
##########
@@ -200,6 +200,7 @@ jobs:
       SKIP_UNIDOC: true
       SKIP_MIMA: true
       SKIP_PACKAGING: true
+      LIVE_UI_LOCAL_STORE_DIR: "/tmp/kvStore"

Review Comment:
   @zhaomin1423 All current tests have passed, and the additional verification purpose for rocksdb has been achieved. We need to remove the changes to this file before merging.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1398623958


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   We can revert change of `closeIterator` method in `LevelDB` and refactor it as :
   
   ```java
     void closeIterator(DBIterator it) throws IOException {
       notifyIteratorClosed(it);
       synchronized (this._db) {
         DB _db = this._db.get();
         if (_db != null) {
           it.close();
         }
       }
     }
   ```
   
   then there we can call `closeIterator` directly like:
   
   ```java
   public void run() {
         if (started.compareAndSet(true, false)) {
           try {
             levelDB.closeIterator(dbIterator);
           } catch (IOException e) {
             // logXXX("Failed to close iterator", e);
           }
         }
       }
   ```
   
   Do you mean this idea? @mridulm 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1378069029


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBSuite.java:
##########
@@ -381,6 +382,56 @@ public void testSkipAfterDBClose() throws Exception {
     assertFalse(iter.skip(1));
   }
 
+  @Test
+  public void testResourceCleaner() throws Exception {
+    File dbPathForCleanerTest = File.createTempFile(
+      "test_db_cleaner.", ".rdb");
+    dbPathForCleanerTest.delete();
+
+    RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
+    try {
+      for (int i = 0; i < 8192; i++) {
+        dbForCleanerTest.write(createCustomType1(i));
+      }
+      RocksDBIterator<CustomType1> rocksDBIterator =
+        (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
+      Reference<RocksDBIterator<?>> reference =
+        getRocksDBIteratorRef(rocksDBIterator, dbForCleanerTest);
+      assertNotNull(reference);
+      RocksIterator it = rocksDBIterator.internalIterator();
+      // it has not been closed yet, isOwningHandle should be true.
+      assertTrue(it.isOwningHandle());
+      // Manually set rocksDBIterator to null, to be GC.
+      rocksDBIterator = null;
+      // 100 times gc, the rocksDBIterator should be GCed.
+      int count = 0;
+      while (count < 100 && !reference.refersTo(null)) {
+        System.gc();
+        count++;
+        Thread.sleep(100);
+      }
+      // check rocksDBIterator should be GCed
+      assertTrue(reference.refersTo(null));
+      // Verify that the Cleaner will be executed after a period of time,
+      // and it.isOwningHandle() will become false.
+      assertTimeout(java.time.Duration.ofSeconds(5), () -> assertFalse(it.isOwningHandle()));

Review Comment:
   Given the addition of a state variable, can the test be simplified?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1377001116


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   @zhaomin1423 Overall, we want to make the close method of `RocksDBIterator` look similar. The `notifyIteratorClosed ` method in `RocksDB`  needs to be retained (it can be refactored for reuse by `ResourceCleaner`).
   
   ```
     public synchronized void close() throws IOException {
       db.notifyIteratorClosed(this);
       if (!closed) {
         it.close();
         closed = true;
         next = null;
         this.cancelXXX();
       }
     }
   ```
   
   The purpose of `this.cancelXXX();`  is to prevent `ResourceCleaner` from executing repeatedly when manually closed.
   
   
   



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   @zhaomin1423 Overall, we want to make the close method of `RocksDBIterator` look similar. The `notifyIteratorClosed ` method in `RocksDB`  needs to be retained (it can be refactored for reuse by `ResourceCleaner`).
   
   ```
     public synchronized void close() throws IOException {
       db.notifyIteratorClosed(this);
       if (!closed) {
         it.close();
         closed = true;
         next = null;
         this.cancelXXX();
       }
     }
   ```
   
   The purpose of `this.cancelXXX();`  is to prevent `ResourceCleaner` from executing repeatedly when manually closed.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1377001120


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   @zhaomin1423 Overall, I want to make the close method of `RocksDBIterator` look similar. The `notifyIteratorClosed ` method in `RocksDB`  needs to be retained (it can be refactored for reuse by `ResourceCleaner`).
   
   ```
     public synchronized void close() throws IOException {
       db.notifyIteratorClosed(this);
       if (!closed) {
         it.close();
         closed = true;
         next = null;
         this.cancelXXX();
       }
     }
   ```
   
   The purpose of `this.cancelXXX();`  is to prevent `ResourceCleaner` from executing repeatedly when manually closed.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1778424888

   Let's add a test case first.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371083522


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +269,39 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+
+    private final AtomicReference<org.rocksdb.RocksDB> _db;
+
+    private final ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker;
+
+    public ResourceCleaner(RocksIterator rocksIterator,
+        AtomicReference<org.rocksdb.RocksDB> _db,
+        ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker) {
+      this.rocksIterator = rocksIterator;
+      this._db = _db;
+      this.iteratorTracker = iteratorTracker;
+    }
+
+    @Override
+    public void run() {
+      synchronized (this._db) {
+        org.rocksdb.RocksDB _db = this._db.get();
+        if (_db == null) {
+          return;
+        }
+        rocksIterator.close();
+      }
+      iteratorTracker.removeIf(ref -> {
+          RocksDBIterator<?> rocksDBIterator = ref.get();
+          if (rocksDBIterator != null && rocksIterator.equals(rocksDBIterator.it)) {
+            return true;
+          } else {
+            return false;
+          }
+        });

Review Comment:
   thanks, fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371083658


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -50,6 +56,8 @@ class RocksDBIterator<T> implements KVStoreIterator<T> {
     this.ti = db.getTypeInfo(type);
     this.index = ti.index(params.index);
     this.max = params.max;
+    this.cleanable = CLEANER.register(this,
+        new RocksDBIterator.ResourceCleaner(it, db.getRocksDB(), db.getIteratorTracker()));

Review Comment:
   Could we just pass in `db`?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371245627


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,26 +355,22 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      org.rocksdb.RocksDB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
+  AtomicReference<org.rocksdb.RocksDB> getRocksDB() {
+    return _db;
   }
 
-  /**
-   * Remove iterator from iterator tracker. `RocksDBIterator` calls it to notify
-   * iterator is closed.
-   */
-  void notifyIteratorClosed(RocksDBIterator<?> it) {
-    iteratorTracker.removeIf(ref -> it.equals(ref.get()));
+  ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> getIteratorTracker() {
+    return iteratorTracker;
+  }
+
+  @VisibleForTesting

Review Comment:
   hmm.. this method can be moved to RocksDBSuite.java as a private method due to `RocksDB`  already has `getIteratorTracker(`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371254821


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBSuite.java:
##########
@@ -381,6 +382,40 @@ public void testSkipAfterDBClose() throws Exception {
     assertFalse(iter.skip(1));
   }
 
+  @Test
+  public void testResourceCleaner() throws Exception {
+    File dbPathForCleanerTest = File.createTempFile(
+      "test_db_cleaner.", ".rdb");
+    dbPathForCleanerTest.delete();
+
+    RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
+    for (int i = 0; i < 8192; i++) {
+      dbForCleanerTest.write(createCustomType1(i));
+    }
+
+    RocksDBIterator<CustomType1> rocksDBIterator = (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
+    Reference<RocksDBIterator<?>> reference = dbForCleanerTest.getRocksDBIteratorRef(rocksDBIterator);
+    assertNotNull(reference);
+    RocksIterator it = rocksDBIterator.internalIterator();
+    // it has not been closed yet, isOwningHandle should be true.
+    assertTrue(it.isOwningHandle());
+    // Manually set rocksDBIterator to null, to be GC.
+    rocksDBIterator = null;
+    // 100 times gc, the rocksDBIterator should be GCed.
+    int count = 0;
+    while (count < 100 && !reference.refersTo(null)) {
+      System.gc();
+      count++;
+      Thread.sleep(100);
+    }
+    // check rocksDBIterator should be GCed
+    assertTrue(reference.refersTo(null));
+    // Verify that the Cleaner will be executed after a period of time, and it.isOwningHandle() will become false.
+    assertTimeout(java.time.Duration.ofSeconds(5), () -> assertFalse(it.isOwningHandle()));
+
+    dbForCleanerTest.close();

Review Comment:
   OK, it's fine as long as we can ensure there are no residues after testing.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1372946640


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   When is `levelDBIterator` going to be `null` ?
   Also, instead of `equals` here shouldnt it not be instance check (`==`) ?
   
   +CC @LuciferYang, looks like there is a leak in the DB, iterator's weak ref is never cleaned up from the tracker ?
   



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   This is changing the locking semantics. Is this intentional ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1372676965


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,24 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);
+      });
+      synchronized (levelDB.getLevelDB()) {
+        DB _db = levelDB.getLevelDB().get();
+        if (_db == null) {

Review Comment:
   thanks, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371239595


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBSuite.java:
##########
@@ -381,6 +382,40 @@ public void testSkipAfterDBClose() throws Exception {
     assertFalse(iter.skip(1));
   }
 
+  @Test
+  public void testResourceCleaner() throws Exception {
+    File dbPathForCleanerTest = File.createTempFile(
+      "test_db_cleaner.", ".rdb");
+    dbPathForCleanerTest.delete();
+
+    RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
+    for (int i = 0; i < 8192; i++) {
+      dbForCleanerTest.write(createCustomType1(i));
+    }
+
+    RocksDBIterator<CustomType1> rocksDBIterator = (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
+    Reference<RocksDBIterator<?>> reference = dbForCleanerTest.getRocksDBIteratorRef(rocksDBIterator);
+    assertNotNull(reference);
+    RocksIterator it = rocksDBIterator.internalIterator();
+    // it has not been closed yet, isOwningHandle should be true.
+    assertTrue(it.isOwningHandle());
+    // Manually set rocksDBIterator to null, to be GC.
+    rocksDBIterator = null;
+    // 100 times gc, the rocksDBIterator should be GCed.
+    int count = 0;
+    while (count < 100 && !reference.refersTo(null)) {
+      System.gc();
+      count++;
+      Thread.sleep(100);
+    }
+    // check rocksDBIterator should be GCed
+    assertTrue(reference.refersTo(null));
+    // Verify that the Cleaner will be executed after a period of time, and it.isOwningHandle() will become false.
+    assertTimeout(java.time.Duration.ofSeconds(5), () -> assertFalse(it.isOwningHandle()));
+
+    dbForCleanerTest.close();

Review Comment:
   Let's move `dbForCleanerTest.close()` to finally block and t would be better to add `FileUtils.deleteQuietly(dbPathForCleanerTest);` after close in finally block



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1372627241


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +188,16 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
+      cleanable.clean();;

Review Comment:
   Thanks, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1780625609

   > > SKIP_PACKAGING
   > 
   > done
   
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394277068


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   It was originally in finalize method, now it has been moved to ResourceCleaner.run. I think this logic is consistent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1391368440


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {
+        throw uncheckedIOException.getCause();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from actually releasing resources after close it.
    */
-  @SuppressWarnings("deprecation")
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.setStartedToFalse();
+    this.cleanable.clean();

Review Comment:
   Yes, it's a no-op. Even if we don't call it, the Cleaner will invoke the clean method on objects that have been garbage collected. Personally, I prefer to have the Cleaner execute more effective resource reclamation when it calls ResourceCleaner, rather than such no-ops.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1396681491


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +302,43 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final DBIterator dbIterator;
+
+    private final LevelDB levelDB;
+
+    private final AtomicBoolean started = new AtomicBoolean(true);
+
+    ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) {
+      this.dbIterator = dbIterator;
+      this.levelDB = levelDB;
+    }
+
+    @Override
+    public void run() {
+      if (started.compareAndSet(true, false)) {
+        levelDB.notifyIteratorClosed(dbIterator);
+        synchronized (levelDB.getLevelDB()) {
+          DB _db = levelDB.getLevelDB().get();
+          if (_db != null) {
+            try {
+              dbIterator.close();
+            } catch (IOException e) {
+              throw new UncheckedIOException(e);

Review Comment:
   nit: Personally, perhaps printing a log is enough, as there should be no other places capturing this exception now.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1376483755


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   > I personally prefer to keep the method of manually closing as it is through some refactoring, without locking the db.
   
   sorry, I don't undenstand this implementation how to do it?
   ```
   /**
      * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
      * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
      */
     void closeIterator(LevelDBIterator<?> it) throws IOException {
       notifyIteratorClosed(it);
       synchronized (this._db) {
         DB _db = this._db.get();
         if (_db != null) {
           it.close();
         }
       }
     }
   ``` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1374106447


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   sorry for late, do we need to change anything?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1374090070


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   > Given this is enhancement for finalize -> Cleaner, it is fine to simply use existing semantics and change this condition to simply dbIterator == levelDBIterator.it, thoughts ?
   
   hmm... `levelDBIterator` is `ref.get`, if `levelDBIterator` is null, `levelDBIterator.it` will throw NPE ...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1399081856


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   Great, I see.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1370432372


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +270,32 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+
+    private final AtomicReference<org.rocksdb.RocksDB> _db;
+
+    private final ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker;
+
+    public ResourceCleaner(RocksIterator rocksIterator,
+        AtomicReference<org.rocksdb.RocksDB> _db,
+        ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker) {
+      this.rocksIterator = rocksIterator;
+      this._db = _db;
+      this.iteratorTracker = iteratorTracker;
+    }
+
+    @Override
+    public void run() {
+      synchronized (this._db) {
+        org.rocksdb.RocksDB _db = this._db.get();
+        if (_db == null) {
+          return;
+        }
+        iteratorTracker.removeIf(ref -> ref.get() != null && rocksIterator.equals(ref.get().it));

Review Comment:
   thanks, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1778423067

   > @zhaomin1423
   > 
   > Firstly, we may need to add two helper methods:
   > 
   > 1. Add a method in `RocksDB` that can get its corresponding `WeakReference` through `RocksDBIterator`, here I assume it's called `iteratorReference`
   > 2. Add a method in `RocksDBIterator` to get `RocksIterator`, here I assume it's called `internalIterator`
   > 
   > Then, the test case may like the following:
   > 
   > ```java
   > @Test
   > public void testResourceCleaner() throws Exception {
   >   File dbPathForCleanerTest = File.createTempFile(
   >     "test_db_cleaner.", ".rdb");
   >   dbPathForCleanerTest.delete();
   > 
   >   RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
   >   for (int i = 0; i < 8192; i++) {
   >     dbForCleanerTest.write(createCustomType1(i));
   >   }
   > 
   >   RocksDBIterator<CustomType1> rocksDBIterator = (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
   >   Optional<Reference<RocksDBIterator<?>>> referenceOpt = dbForCleanerTest.iteratorReference(rocksDBIterator);
   >   assertTrue(referenceOpt.isPresent());
   >   RocksIterator it = rocksDBIterator.internalIterator();
   >   assertTrue(it.isOwningHandle()); // it has not been closed yet, isOwningHandle should be true.
   >   rocksDBIterator = null; // Manually set rocksDBIterator to null, to be GC.
   >   Reference<RocksDBIterator<?>> ref = referenceOpt.get();
   >   // 100 times gc, the rocksDBIterator should be GCed.
   >   int count = 0;
   >   while (count < 100 && !ref.refersTo(null)) {
   >     System.gc();
   >     count++;
   >     Thread.sleep(100);
   >   }
   >   assertTrue(ref.refersTo(null)); // check rocksDBIterator should be GCed
   >   // Verify that the Cleaner will be executed after a period of time, and it.isOwningHandle() will become false.
   >   assertTimeout(java.time.Duration.ofSeconds(5), () -> assertFalse(it.isOwningHandle()));
   > 
   >   dbForCleanerTest.close();
   > }
   > ```
   > 
   > Of course, we can also add a state in `ResourceCleaner` to indicate whether it has been executed, initially set to false, and turns true after execution. These are just some of my thoughts, there might be better ways to test.
   
   Adding a state is simpler and works for different classes, but I don't know if adding variables for testing is a common approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371234514


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/DBIteratorSuite.java:
##########
@@ -506,5 +506,4 @@ private List<CustomType1> sortBy(Comparator<CustomType1> comp) {
     Collections.sort(copy, comp);
     return copy;
   }
-

Review Comment:
   please revert this chanage



##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBIteratorSuite.java:
##########
@@ -44,5 +44,4 @@ protected KVStore createStore() throws Exception {
     db = new RocksDB(dbpath);
     return db;
   }
-

Review Comment:
   please revert this chanage



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371089595


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -50,6 +56,8 @@ class RocksDBIterator<T> implements KVStoreIterator<T> {
     this.ti = db.getTypeInfo(type);
     this.index = ti.index(params.index);
     this.max = params.max;
+    this.cleanable = CLEANER.register(this,
+        new RocksDBIterator.ResourceCleaner(it, db.getRocksDB(), db.getIteratorTracker()));

Review Comment:
   Indentation: 2 spaces



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371333377


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,26 +355,22 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      org.rocksdb.RocksDB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
+  AtomicReference<org.rocksdb.RocksDB> getRocksDB() {
+    return _db;
   }
 
-  /**
-   * Remove iterator from iterator tracker. `RocksDBIterator` calls it to notify
-   * iterator is closed.
-   */
-  void notifyIteratorClosed(RocksDBIterator<?> it) {
-    iteratorTracker.removeIf(ref -> it.equals(ref.get()));
+  ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> getIteratorTracker() {
+    return iteratorTracker;
+  }
+
+  @VisibleForTesting

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1372542229


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +188,16 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
+      cleanable.clean();;

Review Comment:
   An extra semicolon.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1372627076


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBSuite.java:
##########
@@ -383,6 +385,51 @@ public void testSkipAfterDBClose() throws Exception {
     assertFalse(iter.skip(1));
   }
 
+  @Test
+  public void testResourceCleaner() throws Exception {
+    File dbPathForCleanerTest = File.createTempFile(
+      "test_db_cleaner.", ".rdb");
+    dbPathForCleanerTest.delete();
+
+    LevelDB dbForCleanerTest = new LevelDB(dbPathForCleanerTest);
+    try {
+      for (int i = 0; i < 8192; i++) {
+        dbForCleanerTest.write(createCustomType1(i));
+      }
+      LevelDBIterator<CustomType1> levelDBIterator = (LevelDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
+      Reference<LevelDBIterator<?>> reference = getRocksDBIteratorRef(levelDBIterator, dbForCleanerTest);
+      assertNotNull(reference);
+      DBIterator it = levelDBIterator.internalIterator();
+      // it has not been closed yet, hasNext execute normally.
+      assertTrue(it.hasNext());
+      // Manually set rocksDBIterator to null, to be GC.
+      levelDBIterator = null;
+      // 100 times gc, the rocksDBIterator should be GCed.
+      int count = 0;
+      while (count < 100 && !reference.refersTo(null)) {
+        System.gc();
+        count++;
+        Thread.sleep(100);
+      }
+      // check rocksDBIterator should be GCed
+      assertTrue(reference.refersTo(null));
+      // Verify that the Cleaner will be executed after a period of time, there is an exception message. java.lang.AssertionError: This object has been deleted
+      assertThrows(AssertionError.class, it::hasNext, "This object has been deleted");

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1377001120


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   @zhaomin1423 Overall, we can make the close method of `RocksDBIterator` look similar. The `notifyIteratorClosed ` method in `RocksDB`  needs to be retained (it can be refactored for reuse by `ResourceCleaner`).
   
   ```
     public synchronized void close() throws IOException {
       db.notifyIteratorClosed(this);
       if (!closed) {
         it.close();
         closed = true;
         next = null;
         this.cancelXXX();
       }
     }
   ```
   
   The purpose of `this.cancelXXX();`  is to prevent `ResourceCleaner` from executing repeatedly when manually closed.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1808299310

   @zhaomin1423 Do you have time to rebase the code and make changes to the LevelDB section?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394375372


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   The logic is consistent - and would remain the same with and without this move - I was wondering why the change ? Whether this was simply refactoring or there is any other reason for it.
   
   (Note: I am not advocating for reverting, just want to understand why this was done).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1815743637

   There is one [open query](https://github.com/apache/spark/pull/43502#discussion_r1391335306) above, no ? (I skimmed the last few conversations)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1373003887


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   > looks like there is a leak in the DB, iterator's weak ref is never cleaned up from the tracker ?
   
   
   I think this should be a old problem, and we have also performed null checks in the close method of RocksDB.
   
   https://github.com/apache/spark/blob/7d7afb06f682c10f3900eb8adeab9fad6d49cb24/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java#L341-L348
   
   https://github.com/apache/spark/blob/7d7afb06f682c10f3900eb8adeab9fad6d49cb24/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java#L307-L315
   
   If an iterator is no longer referenced and is not manually closed, then if GC occurs at this time, the iteratorTracker will still hold the Ref, but Ref#get is null.
   
   Should we change this condition to 
   
   ```
   return levelDBIterator == null || dbIterator.equals(levelDBIterator.it); 
   ```
   
   Can this also clean up the invalid weakref in passing?
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1373003887


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   ```
   looks like there is a leak in the DB, iterator's weak ref is never cleaned up from the tracker ?
   ```
   
   I think this should be a old problem, and we have also performed null checks in the close method of RocksDB.
   
   https://github.com/apache/spark/blob/7d7afb06f682c10f3900eb8adeab9fad6d49cb24/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java#L341-L348
   
   https://github.com/apache/spark/blob/7d7afb06f682c10f3900eb8adeab9fad6d49cb24/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java#L307-L315
   
   If an iterator is no longer referenced and is not manually closed, then if GC occurs at this time, the iteratorTracker will still hold the Ref, but Ref#get is null.
   
   Should we change this condition to 
   
   ```
   return levelDBIterator == null || dbIterator.equals(levelDBIterator.it); 
   ```
   
   Can this also clean up the invalid weakref in passing?
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1373003887


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   > looks like there is a leak in the DB, iterator's weak ref is never cleaned up from the tracker ?
   
   
   I think this should be a old problem, and we have also performed null checks in the close method of RocksDB.
   
   https://github.com/apache/spark/blob/7d7afb06f682c10f3900eb8adeab9fad6d49cb24/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java#L341-L348
   
   https://github.com/apache/spark/blob/7d7afb06f682c10f3900eb8adeab9fad6d49cb24/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java#L307-L315
   
   If an iterator is no longer referenced and is not manually closed, then if GC occurs at this time, the iteratorTracker will still hold the Ref, but Ref#get is null.
   
   Should we change this condition to 
   
   ```
   return levelDBIterator == null || dbIterator == levelDBIterator.it; 
   ```
   
   Can this also clean up the invalid weakref in passing?
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1808293353

   friendly ping @mridulm @beliefer Is the current change to the RocksDB section ok for you? Should we continue to push for changes in this pr?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1398084223


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +302,43 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final DBIterator dbIterator;
+
+    private final LevelDB levelDB;
+
+    private final AtomicBoolean started = new AtomicBoolean(true);
+
+    ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) {
+      this.dbIterator = dbIterator;
+      this.levelDB = levelDB;
+    }
+
+    @Override
+    public void run() {
+      if (started.compareAndSet(true, false)) {
+        levelDB.notifyIteratorClosed(dbIterator);
+        synchronized (levelDB.getLevelDB()) {
+          DB _db = levelDB.getLevelDB().get();
+          if (_db != null) {
+            try {
+              dbIterator.close();
+            } catch (IOException e) {
+              throw new UncheckedIOException(e);

Review Comment:
   Good suggestion, I re-examined it,the JniDBIterator  can't throw an IOException, could we just ignore it?
   
   <img width="632" alt="image" src="https://github.com/apache/spark/assets/49054376/4c5c9e6c-4c06-4175-9a1f-e12577a6f86a">
   
   



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +302,43 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final DBIterator dbIterator;
+
+    private final LevelDB levelDB;
+
+    private final AtomicBoolean started = new AtomicBoolean(true);
+
+    ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) {
+      this.dbIterator = dbIterator;
+      this.levelDB = levelDB;
+    }
+
+    @Override
+    public void run() {
+      if (started.compareAndSet(true, false)) {
+        levelDB.notifyIteratorClosed(dbIterator);
+        synchronized (levelDB.getLevelDB()) {
+          DB _db = levelDB.getLevelDB().get();
+          if (_db != null) {
+            try {
+              dbIterator.close();
+            } catch (IOException e) {
+              throw new UncheckedIOException(e);

Review Comment:
   Good suggestion, I re-examined it,the JniDBIterator  can't throw an IOException, could we just ignore it?
   
   <img width="632" alt="image" src="https://github.com/apache/spark/assets/49054376/4c5c9e6c-4c06-4175-9a1f-e12577a6f86a">
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1377858576


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   Get, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1378067392


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -180,13 +183,20 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
+    db.notifyIteratorClosed(this);
     if (!closed) {
-      cleanable.clean();
+      it.close();

Review Comment:
   ```java
   try {
   it.close();
   closed = true;
   next = null;
   } finally {
   cancelResourceClean();
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1379542967


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +287,32 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean status = new AtomicBoolean(true);
+
+    public ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
+
+    @Override
+    public void run() {
+      if (status.compareAndSet(true, false)) {
+        rocksDB.notifyIteratorClosed(rocksIterator);
+        rocksIterator.close();

Review Comment:
   I think this close still needs synchronized db...
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1379542967


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +287,32 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean status = new AtomicBoolean(true);
+
+    public ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
+
+    @Override
+    public void run() {
+      if (status.compareAndSet(true, false)) {
+        rocksDB.notifyIteratorClosed(rocksIterator);
+        rocksIterator.close();

Review Comment:
   I think this close still needs synchronized....
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1377851138


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   ```
   private void cancelXXX() {
       this.resourceCleaner.statusToFalse();
       this.cleanable.clean();
     }
   ```
   is ```this.cleanable.clean() ``` redundant? it can't be executed because the status is false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1382330007


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBSuite.java:
##########
@@ -381,6 +382,55 @@ public void testSkipAfterDBClose() throws Exception {
     assertFalse(iter.skip(1));
   }
 
+  @Test
+  public void testResourceCleaner() throws Exception {
+    File dbPathForCleanerTest = File.createTempFile(
+      "test_db_cleaner.", ".rdb");
+    dbPathForCleanerTest.delete();
+
+    RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
+    try {
+      for (int i = 0; i < 8192; i++) {
+        dbForCleanerTest.write(createCustomType1(i));
+      }
+      RocksDBIterator<CustomType1> rocksDBIterator =
+        (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
+      Reference<RocksDBIterator<?>> reference =

Review Comment:
   In this case, since we didn't manually close the `RocksDBIterator`, we can ultimately determine whether the `RocksDBIterator` is closed through `resourceCleaner.getStatus()`. 
   
   So here, `reference` is just an auxiliary tool to judge whether the `RocksDBIterator` has been GCed. So can we change it to manually wrap a `WeakReference` instead of getting it from the `iteratorTracker`,like `new WeakReference<>(rocksDBIterator) `, so that we don't need to add an `Accessor` for the `iteratorTracker`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1377856270


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   This is to have it removed from the Cleaner's queue as quickly as possible.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1382625914


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +290,37 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean started = new AtomicBoolean(true);
+
+    ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
+
+    @Override
+    public void run() {
+      if (started.compareAndSet(true, false)) {
+        rocksDB.notifyIteratorClosed(rocksIterator);
+        synchronized (rocksDB.getRocksDB()) {
+          org.rocksdb.RocksDB _db = rocksDB.getRocksDB().get();
+          if (_db != null) {
+            rocksIterator.close();
+          }
+        }
+      }
+    }
+
+    void setStartedToFalse() {
+      started.set(false);
+    }
+
+    @VisibleForTesting
+    boolean getStarted() {

Review Comment:
   ```java
   boolean isCompleted() {
     return !started.get();
   }
   ```
    ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1392615323


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {

Review Comment:
   It should be impossible to catch an exception of type  `UncheckedIOException` here.`
   
   



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {

Review Comment:
   It should be impossible to catch an exception of type  `UncheckedIOException` here now
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1369630645


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -285,6 +288,7 @@ public Iterator<T> iterator() {
         try {
           RocksDBIterator<T> it = new RocksDBIterator<>(type, RocksDB.this, this);
           iteratorTracker.add(new WeakReference<>(it));
+          CLEANER.register(it, new RocksDBIterator.ResourceCleaner(it.getRocksIterator(), _db, iteratorTracker));

Review Comment:
   I believe we should move `CLEANER` to RocksIterator, and also place the registration process in the constructor of `RocksDBIterator`. Considering the extensibility in the future, we should not assume that it will only be registered in this one place.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1369630645


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -285,6 +288,7 @@ public Iterator<T> iterator() {
         try {
           RocksDBIterator<T> it = new RocksDBIterator<>(type, RocksDB.this, this);
           iteratorTracker.add(new WeakReference<>(it));
+          CLEANER.register(it, new RocksDBIterator.ResourceCleaner(it.getRocksIterator(), _db, iteratorTracker));

Review Comment:
   I think we should move `CLEANER` to RocksIterator, and also place the registration process in the constructor of `RocksDBIterator`. Considering the extensibility in the future, we should not assume that it will only be registered in this one place.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371084374


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +273,39 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+
+    private final AtomicReference<org.rocksdb.RocksDB> _db;
+
+    private final ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker;
+
+    public ResourceCleaner(RocksIterator rocksIterator,
+        AtomicReference<org.rocksdb.RocksDB> _db,
+        ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker) {
+      this.rocksIterator = rocksIterator;
+      this._db = _db;
+      this.iteratorTracker = iteratorTracker;
+    }
+
+    @Override
+    public void run() {
+      iteratorTracker.removeIf(ref -> {
+        RocksDBIterator<?> rocksDBIterator = ref.get();
+        if (rocksDBIterator != null && rocksIterator.equals(rocksDBIterator.it)) {

Review Comment:
   ```
   if (rocksDBIterator != null && rocksIterator.equals(rocksDBIterator.it)) {
             return true;
           } else {
             return false;
           }
   ```
   ->
   
   ```
   return rocksDBIterator != null && rocksIterator.equals(rocksDBIterator.it);
   ```
   ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1374040714


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   w.r.t change in logic ... in current code, I dont think `levelDB.getIteratorTracker()` will ever contain a `null` within it (the ref it points to can be `null`).
   
   Given this is enhancement for `finalize` -> `Cleaner`, it is fine to simply use existing semantics and change this condition to simply `dbIterator == levelDBIterator.it`, thoughts ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1374056091


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   Agree +1, we can fix the leak in follow up



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1374102628


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   You are right, I missed that levelDBIterator is itself ref.get ... sigh, sorry about this !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1777630300

   @zhaomin1423 
   
   Firstly, we may need to add two helper methods:
   1. Add a method in `RocksDB` that can get its corresponding `WeakReference` through `RocksDBIterator`, here I assume it's called `iteratorReference`
   2. Add a method in `RocksDBIterator` to get `RocksIterator`, here I assume it's called `internalIterator`
   
   Then, the test case may like the following:
   
     ```java
     @Test
     public void testResourceCleaner() throws Exception {
       File dbPathForCleanerTest = File.createTempFile(
         "test_db_cleaner.", ".rdb");
       dbPathForCleanerTest.delete();
   
       RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
       for (int i = 0; i < 8192; i++) {
         dbForCleanerTest.write(createCustomType1(i));
       }
   
       RocksDBIterator<CustomType1> rocksDBIterator = (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
       Optional<Reference<RocksDBIterator<?>>> referenceOpt = dbForCleanerTest.iteratorReference(rocksDBIterator);
       assertTrue(referenceOpt.isPresent());
       RocksIterator it = rocksDBIterator.internalIterator();
       assertTrue(it.isOwningHandle()); // it has not been closed yet, isOwningHandle should be true.
       rocksDBIterator = null; // Manually set rocksDBIterator to null, to be GC.
       Reference<RocksDBIterator<?>> ref = referenceOpt.get();
       // 100 times gc, the rocksDBIterator should be GCed.
       int count = 0;
       while (count < 100 && !ref.refersTo(null)) {
         System.gc();
         count++;
         Thread.sleep(100);
       }
       assertTrue(ref.refersTo(null)); // check rocksDBIterator should be GCed
       // Verify that the Cleaner will be executed after a period of time, after which it.isOwningHandle() will become false.
       assertTimeout(java.time.Duration.ofSeconds(5), () -> assertFalse(it.isOwningHandle()));
   
       dbForCleanerTest.close();
     }
     ```
   
   Of course, we can also add a state in `ResourceCleaner` to indicate whether it has been executed, initially set to false, and turns true after execution. These are just some of my thoughts, there might be better ways to test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371100897


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +273,39 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {

Review Comment:
   can be a `record`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1372643667


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,24 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);
+      });
+      synchronized (levelDB.getLevelDB()) {
+        DB _db = levelDB.getLevelDB().get();
+        if (_db == null) {

Review Comment:
   What are the advantages compared to 
   
   ```java
   if (_db != null) {
   try {
             dbIterator.close();
           } catch (IOException e) {
             throw new UncheckedIOException(e);
           }
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1375212924


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   I personally prefer to keep the method of manually closing as it is through some refactoring, without locking the db.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1391383180


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {
+        throw uncheckedIOException.getCause();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from actually releasing resources after close it.
    */
-  @SuppressWarnings("deprecation")
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.setStartedToFalse();
+    this.cleanable.clean();

Review Comment:
   However, I indeed don't have valid test data yet to support my viewpoint.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1396674249


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,34 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from trying to release resources after close.
    */
-  @SuppressWarnings("deprecation")
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.setStartedToFalse();
+    this.cleanable.clean();
+  }
+
+  @VisibleForTesting

Review Comment:
   this is used by `notifyIteratorClosed`, not `@VisibleForTesting` now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1815774008

   > There is one [open query](https://github.com/apache/spark/pull/43502#discussion_r1391335306) above, no ? (I skimmed the last few conversations)
   
   Sorry, I missing this query. We should reach a consensus on this before merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1780616800

   > SKIP_PACKAGING
   
   done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1374038863


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   I am not necessarily saying it is an incorrect change - I have not analyzed in detail.
   But in context of `Cleaner` related change, I would have expected the modifications to be equivalent and not require MT safety impact :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1391358903


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -176,22 +183,33 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+        closed = true;
+        next = null;

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1391380331


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {
+        throw uncheckedIOException.getCause();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from actually releasing resources after close it.
    */
-  @SuppressWarnings("deprecation")
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.setStartedToFalse();
+    this.cleanable.clean();

Review Comment:
   My concern is: if we don't manually call the `clean` method here, when the `Cleaner` actively performs GC, it will accumulate a large number of 'no-op' calls, which prevents tasks that really need to be recycled from being executed as soon as possible.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371259731


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBSuite.java:
##########
@@ -381,6 +382,40 @@ public void testSkipAfterDBClose() throws Exception {
     assertFalse(iter.skip(1));
   }
 
+  @Test
+  public void testResourceCleaner() throws Exception {
+    File dbPathForCleanerTest = File.createTempFile(
+      "test_db_cleaner.", ".rdb");
+    dbPathForCleanerTest.delete();
+
+    RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
+    for (int i = 0; i < 8192; i++) {
+      dbForCleanerTest.write(createCustomType1(i));
+    }
+
+    RocksDBIterator<CustomType1> rocksDBIterator = (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
+    Reference<RocksDBIterator<?>> reference = dbForCleanerTest.getRocksDBIteratorRef(rocksDBIterator);
+    assertNotNull(reference);
+    RocksIterator it = rocksDBIterator.internalIterator();
+    // it has not been closed yet, isOwningHandle should be true.
+    assertTrue(it.isOwningHandle());
+    // Manually set rocksDBIterator to null, to be GC.
+    rocksDBIterator = null;
+    // 100 times gc, the rocksDBIterator should be GCed.
+    int count = 0;
+    while (count < 100 && !reference.refersTo(null)) {
+      System.gc();
+      count++;
+      Thread.sleep(100);
+    }
+    // check rocksDBIterator should be GCed
+    assertTrue(reference.refersTo(null));
+    // Verify that the Cleaner will be executed after a period of time, and it.isOwningHandle() will become false.
+    assertTimeout(java.time.Duration.ofSeconds(5), () -> assertFalse(it.isOwningHandle()));
+
+    dbForCleanerTest.close();

Review Comment:
   sorry, if remove it, there is an exception.
   ```
   org.rocksdb.RocksDBException: `/Users/zhaomin/Files/project/spark/common/kvstore/target/tmp/test_db_cleaner.6329921973463604622.rdb' exists but is not a directory
   
   	at org.rocksdb.RocksDB.open(Native Method)
   	at org.rocksdb.RocksDB.open(RocksDB.java:249)
   	at org.apache.spark.util.kvstore.RocksDB.<init>(RocksDB.java:122)
   	at org.apache.spark.util.kvstore.RocksDB.<init>(RocksDB.java:116)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1369664549


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +270,32 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+
+    private final AtomicReference<org.rocksdb.RocksDB> _db;
+
+    private final ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker;
+
+    public ResourceCleaner(RocksIterator rocksIterator,
+        AtomicReference<org.rocksdb.RocksDB> _db,
+        ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker) {
+      this.rocksIterator = rocksIterator;
+      this._db = _db;
+      this.iteratorTracker = iteratorTracker;
+    }
+
+    @Override
+    public void run() {
+      synchronized (this._db) {
+        org.rocksdb.RocksDB _db = this._db.get();
+        if (_db == null) {
+          return;
+        }
+        iteratorTracker.removeIf(ref -> ref.get() != null && rocksIterator.equals(ref.get().it));

Review Comment:
   ```
   iteratorTracker.removeIf(ref -> ref.get() != null && rocksIterator.equals(ref.get().it));
   ```
   Maybe this part can be placed outside of `synchronized `?
   
   



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -36,20 +41,24 @@ class RocksDBIterator<T> implements KVStoreIterator<T> {
   private final byte[] indexKeyPrefix;
   private final byte[] end;
   private final long max;
+  private final Cleaner.Cleanable cleanable;
 
   private boolean checkedNext;
   private byte[] next;
   private boolean closed;
   private long count;
 
-  RocksDBIterator(Class<T> type, RocksDB db, KVStoreView<T> params) throws Exception {
+  RocksDBIterator(Class<T> type, RocksDB db, KVStoreView<T> params,
+      ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker,

Review Comment:
   I'm more inclined to add two helper methods in RocksDB to get `iteratorTracker` and `_db`, but I'm also open to other suggestions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371086639


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,26 +355,21 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      org.rocksdb.RocksDB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
+  public AtomicReference<org.rocksdb.RocksDB> getRocksDB() {
+    return _db;
   }
 
-  /**
-   * Remove iterator from iterator tracker. `RocksDBIterator` calls it to notify
-   * iterator is closed.
-   */
-  void notifyIteratorClosed(RocksDBIterator<?> it) {
-    iteratorTracker.removeIf(ref -> it.equals(ref.get()));
+  public ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> getIteratorTracker() {
+    return iteratorTracker;
+  }
+
+  public Optional<Reference<RocksDBIterator<?>>> iteratorReference(RocksDBIterator<?> rocksDBIterator) {

Review Comment:
   This method should add `@VisibleForTesting`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1398042213


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   My understanding is that it is necessary to hold their references, and I have not yet thought of any other way to convey them



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1821074054

   Merged into master for Spark 4.0.
   Thanks @zhaomin1423 ! Your work is greatly appreciated.
   Thank you for your review! @mridulm @beliefer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1377681397


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   Thanks, sorry for late because I am busy recently, I will update it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1382625914


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +290,37 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean started = new AtomicBoolean(true);
+
+    ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
+
+    @Override
+    public void run() {
+      if (started.compareAndSet(true, false)) {
+        rocksDB.notifyIteratorClosed(rocksIterator);
+        synchronized (rocksDB.getRocksDB()) {
+          org.rocksdb.RocksDB _db = rocksDB.getRocksDB().get();
+          if (_db != null) {
+            rocksIterator.close();
+          }
+        }
+      }
+    }
+
+    void setStartedToFalse() {
+      started.set(false);
+    }
+
+    @VisibleForTesting
+    boolean getStarted() {

Review Comment:
   ```java
   boolean isCompleted() {
     return !started.get();
   }
   ``` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1794924200

   Rocksdb part is fine to me. friendly ping  @mridulm @beliefer Ccould you take a look again if you have time? Thanks 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1382363360


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,26 +355,23 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      org.rocksdb.RocksDB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
+  AtomicReference<org.rocksdb.RocksDB> getRocksDB() {

Review Comment:
   Added, how about?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394375372


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   The logic is consistent - and would remain the same with and without this move - I was wondering why the change ? Whether this was simply refactoring or there is any other reason for it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1392651580


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   What is your opinion on the comments here? @zhaomin1423 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1396681491


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +302,43 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final DBIterator dbIterator;
+
+    private final LevelDB levelDB;
+
+    private final AtomicBoolean started = new AtomicBoolean(true);
+
+    ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) {
+      this.dbIterator = dbIterator;
+      this.levelDB = levelDB;
+    }
+
+    @Override
+    public void run() {
+      if (started.compareAndSet(true, false)) {
+        levelDB.notifyIteratorClosed(dbIterator);
+        synchronized (levelDB.getLevelDB()) {
+          DB _db = levelDB.getLevelDB().get();
+          if (_db != null) {
+            try {
+              dbIterator.close();
+            } catch (IOException e) {
+              throw new UncheckedIOException(e);

Review Comment:
   nit: Personally, perhaps printing a log is enough, as there should be no other places capturing this exception now.  but this is not important
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1815716635

   @mridulm If there are no other questions, I will merge this PR as soon as possible :)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371086370


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,26 +355,21 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      org.rocksdb.RocksDB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
+  public AtomicReference<org.rocksdb.RocksDB> getRocksDB() {

Review Comment:
   please remove `public`



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,26 +355,21 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      org.rocksdb.RocksDB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
+  public AtomicReference<org.rocksdb.RocksDB> getRocksDB() {
+    return _db;
   }
 
-  /**
-   * Remove iterator from iterator tracker. `RocksDBIterator` calls it to notify
-   * iterator is closed.
-   */
-  void notifyIteratorClosed(RocksDBIterator<?> it) {
-    iteratorTracker.removeIf(ref -> it.equals(ref.get()));
+  public ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> getIteratorTracker() {

Review Comment:
   ditto



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,26 +355,21 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      org.rocksdb.RocksDB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
+  public AtomicReference<org.rocksdb.RocksDB> getRocksDB() {
+    return _db;
   }
 
-  /**
-   * Remove iterator from iterator tracker. `RocksDBIterator` calls it to notify
-   * iterator is closed.
-   */
-  void notifyIteratorClosed(RocksDBIterator<?> it) {
-    iteratorTracker.removeIf(ref -> it.equals(ref.get()));
+  public ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> getIteratorTracker() {
+    return iteratorTracker;
+  }
+
+  public Optional<Reference<RocksDBIterator<?>>> iteratorReference(RocksDBIterator<?> rocksDBIterator) {

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1369665821


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -36,20 +41,24 @@ class RocksDBIterator<T> implements KVStoreIterator<T> {
   private final byte[] indexKeyPrefix;
   private final byte[] end;
   private final long max;
+  private final Cleaner.Cleanable cleanable;
 
   private boolean checkedNext;
   private byte[] next;
   private boolean closed;
   private long count;
 
-  RocksDBIterator(Class<T> type, RocksDB db, KVStoreView<T> params) throws Exception {
+  RocksDBIterator(Class<T> type, RocksDB db, KVStoreView<T> params,
+      ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker,

Review Comment:
   I'm more inclined to add two helper methods in RocksDB to get `iteratorTracker` and `_db` rather than modifying the constructor , but I'm also open to other suggestions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1369710753


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,20 +355,6 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {

Review Comment:
   `notifyIteratorClosed` can also be deleted.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371239368


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/DBIteratorSuite.java:
##########
@@ -506,5 +506,4 @@ private List<CustomType1> sortBy(Comparator<CustomType1> comp) {
     Collections.sort(copy, comp);
     return copy;
   }
-

Review Comment:
   done



##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBIteratorSuite.java:
##########
@@ -44,5 +44,4 @@ protected KVStore createStore() throws Exception {
     db = new RocksDB(dbpath);
     return db;
   }
-

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394277068


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   It was originally used in finalize method, but now it has been moved to ResourceCleaner.run. I think this logic is consistent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1821078747

   > Merged into master for Spark 4.0. Thanks @zhaomin1423 ! Your work is greatly appreciated. Thank you for your review! @mridulm @beliefer
   
   Thank you very much for your help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1374040714


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   w.r.t change in logic ... in current code, I dont think `levelDB.getIteratorTracker()` will ever contain a `null` within it (the ref it points to can be `null`).
   
   Given this is enhancement for `finalize` -> `Cleaner`, it is fine to simply use existing semantics and change this condition to simply `dbIterator == levelDBIterator.it`, thoughts ?
   
   We can fix the leak in a subsequent pr



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1374039278


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   With respect to `WeakReference` accumulating in that list, agree - that is an existing problem - this PR is not changing it.
   I noticed it while reviewing, and so pinged you @LuciferYang :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1374056091


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   Agree +1, we can fix it in follow up



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1374094014


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   I simplified the condition of rocksdb part  into just  `dbIterator == rocksDBIterator.it` for local testing, there will be many test failures.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1778891511

   friendly ping @dongjoon-hyun @mridulm @beliefer Could you please take another look at this pr  if you have time? Thanks ~
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371068189


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +269,39 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+
+    private final AtomicReference<org.rocksdb.RocksDB> _db;
+
+    private final ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker;
+
+    public ResourceCleaner(RocksIterator rocksIterator,
+        AtomicReference<org.rocksdb.RocksDB> _db,
+        ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker) {
+      this.rocksIterator = rocksIterator;
+      this._db = _db;
+      this.iteratorTracker = iteratorTracker;
+    }
+
+    @Override
+    public void run() {
+      synchronized (this._db) {
+        org.rocksdb.RocksDB _db = this._db.get();
+        if (_db == null) {
+          return;
+        }
+        rocksIterator.close();
+      }
+      iteratorTracker.removeIf(ref -> {
+          RocksDBIterator<?> rocksDBIterator = ref.get();
+          if (rocksDBIterator != null && rocksIterator.equals(rocksDBIterator.it)) {
+            return true;
+          } else {
+            return false;
+          }
+        });

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1391335306


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   Any particular reason to move this into `ResourceCleaner.run` ?



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {
+        throw uncheckedIOException.getCause();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from actually releasing resources after close it.

Review Comment:
   nit: 
   ```suggestion
      * Prevent ResourceCleaner from trying to release resources after close.
   ```



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -176,22 +183,33 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+        closed = true;
+        next = null;

Review Comment:
   Keep both level db and rocks db part consistent w.r.t finally ? (move setting closed/next to finally ?)



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {
+        throw uncheckedIOException.getCause();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from actually releasing resources after close it.
    */
-  @SuppressWarnings("deprecation")
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.setStartedToFalse();
+    this.cleanable.clean();

Review Comment:
   We have already notified and closed the iterator before this method is invoked - and we are forcing a clean again ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1391963989


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {
+        throw uncheckedIOException.getCause();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from actually releasing resources after close it.
    */
-  @SuppressWarnings("deprecation")
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.setStartedToFalse();
+    this.cleanable.clean();

Review Comment:
   It makes sense, let us keep it as is - I wanted to make sure I was not missing something !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1392669927


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {

Review Comment:
   thanks,done



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {

Review Comment:
   thanks,done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1400076838


##########
.github/workflows/build_and_test.yml:
##########
@@ -200,6 +200,7 @@ jobs:
       SKIP_UNIDOC: true
       SKIP_MIMA: true
       SKIP_PACKAGING: true
+      LIVE_UI_LOCAL_STORE_DIR: "/tmp/kvStore"

Review Comment:
   @zhaomin1423 Please revert this change, thanks ~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1398623958


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   We can revert change of `closeIterator` method in `LevelDB` and refactor it as :
   
   ```java
     void closeIterator(DBIterator it) throws IOException {
       notifyIteratorClosed(it);
       synchronized (this._db) {
         DB _db = this._db.get();
         if (_db != null) {
           it.close();
         }
       }
     }
   ```
   
   then there we can call `closeIterator` directly like:
   
   ```
   public void run() {
         if (started.compareAndSet(true, false)) {
           try {
             levelDB.closeIterator(dbIterator);
           } catch (IOException e) {
             // logXXX("Failed to close iterator", e);
           }
         }
       }
   ```
   
   Do you mean this idea? @mridulm 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1808412311

   > @zhaomin1423 Do you have time to rebase the code and make changes to the LevelDB section?
   > 
   > 
   > 
   > 
   
   PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1370432695


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -36,20 +41,24 @@ class RocksDBIterator<T> implements KVStoreIterator<T> {
   private final byte[] indexKeyPrefix;
   private final byte[] end;
   private final long max;
+  private final Cleaner.Cleanable cleanable;
 
   private boolean checkedNext;
   private byte[] next;
   private boolean closed;
   private long count;
 
-  RocksDBIterator(Class<T> type, RocksDB db, KVStoreView<T> params) throws Exception {
+  RocksDBIterator(Class<T> type, RocksDB db, KVStoreView<T> params,
+      ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker,

Review Comment:
   thanks, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1777530901

   > How do we finish the tests?
   
   I haven’t thought of a good testing method yet, do you have any suggestions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1780536278

   @zhaomin1423 The code changes are fine to me.
   
   https://github.com/apache/spark/blob/7d7afb06f682c10f3900eb8adeab9fad6d49cb24/.github/workflows/build_and_test.yml#L202
   https://github.com/apache/spark/blob/7d7afb06f682c10f3900eb8adeab9fad6d49cb24/.github/workflows/build_and_test.yml#L381
   
   Can you add an environment variable `LIVE_UI_LOCAL_STORE_DIR: "/tmp/kvStore"` at the above two places? This will additionally test using RocksDB as LiveUI Backend. I want to verify that this pr do not affect Rocksdb LiveUI feature. We can remove this change after the test is successful.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371011690


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +269,39 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+
+    private final AtomicReference<org.rocksdb.RocksDB> _db;
+
+    private final ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker;
+
+    public ResourceCleaner(RocksIterator rocksIterator,
+        AtomicReference<org.rocksdb.RocksDB> _db,
+        ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker) {
+      this.rocksIterator = rocksIterator;
+      this._db = _db;
+      this.iteratorTracker = iteratorTracker;
+    }
+
+    @Override
+    public void run() {
+      synchronized (this._db) {
+        org.rocksdb.RocksDB _db = this._db.get();
+        if (_db == null) {
+          return;
+        }
+        rocksIterator.close();
+      }
+      iteratorTracker.removeIf(ref -> {
+          RocksDBIterator<?> rocksDBIterator = ref.get();
+          if (rocksDBIterator != null && rocksIterator.equals(rocksDBIterator.it)) {
+            return true;
+          } else {
+            return false;
+          }
+        });

Review Comment:
   It seems we should remove reference for iteratorTracker first, then close rocksIterator?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1378065031


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -270,20 +280,30 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
-  private record ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) implements Runnable {
+  private static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean status = new AtomicBoolean(true);
+
+    public ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
 
     @Override
     public void run() {
-      rocksDB.getIteratorTracker().removeIf(ref -> {
-        RocksDBIterator<?> rocksDBIterator = ref.get();
-        return rocksDBIterator != null && rocksIterator.equals(rocksDBIterator.it);
-      });
-      synchronized (rocksDB.getRocksDB()) {
-        org.rocksdb.RocksDB _db = rocksDB.getRocksDB().get();
-        if (_db != null) {
-          rocksIterator.close();
-        }
+      if (status.compareAndSet(true, false)) {
+        rocksDB.getIteratorTracker().removeIf(ref -> {

Review Comment:
   Is it possible to reuse the `db.notifyIteratorClosed`?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1378942557


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -270,20 +280,30 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
-  private record ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) implements Runnable {
+  private static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean status = new AtomicBoolean(true);
+
+    public ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
 
     @Override
     public void run() {
-      rocksDB.getIteratorTracker().removeIf(ref -> {
-        RocksDBIterator<?> rocksDBIterator = ref.get();
-        return rocksDBIterator != null && rocksIterator.equals(rocksDBIterator.it);
-      });
-      synchronized (rocksDB.getRocksDB()) {
-        org.rocksdb.RocksDB _db = rocksDB.getRocksDB().get();
-        if (_db != null) {
-          rocksIterator.close();
-        }
+      if (status.compareAndSet(true, false)) {
+        rocksDB.getIteratorTracker().removeIf(ref -> {

Review Comment:
   Thanks, updated, levels will update after rocksdb complete



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1380093683


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +287,32 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean status = new AtomicBoolean(true);
+
+    public ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
+
+    @Override
+    public void run() {
+      if (status.compareAndSet(true, false)) {
+        rocksDB.notifyIteratorClosed(rocksIterator);
+        rocksIterator.close();

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1382327255


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,26 +355,23 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      org.rocksdb.RocksDB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
+  AtomicReference<org.rocksdb.RocksDB> getRocksDB() {

Review Comment:
   Some comments need to be added to explain the intent and scope of use of this method.



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,26 +355,23 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      org.rocksdb.RocksDB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
+  AtomicReference<org.rocksdb.RocksDB> getRocksDB() {
+    return _db;
+  }
+
+  ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> getIteratorTracker() {

Review Comment:
   ditto



##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBSuite.java:
##########
@@ -381,6 +382,55 @@ public void testSkipAfterDBClose() throws Exception {
     assertFalse(iter.skip(1));
   }
 
+  @Test
+  public void testResourceCleaner() throws Exception {
+    File dbPathForCleanerTest = File.createTempFile(
+      "test_db_cleaner.", ".rdb");
+    dbPathForCleanerTest.delete();
+
+    RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
+    try {
+      for (int i = 0; i < 8192; i++) {
+        dbForCleanerTest.write(createCustomType1(i));
+      }
+      RocksDBIterator<CustomType1> rocksDBIterator =
+        (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
+      Reference<RocksDBIterator<?>> reference =

Review Comment:
   In this case, since we didn't manually close the `RocksDBIterator`, we can ultimately determine whether the `RocksDBIterator` is closed through `resourceCleaner.getStatus()`. 
   
   So here, `reference` is just an auxiliary tool to judge whether the `RocksDBIterator` has been GCed. So can we change it to manually wrap a `WeakReference` instead of getting it from the `iteratorTracker`, so that we don't need to add an `Accessor` for the `iteratorTracker`?



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +287,37 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean status = new AtomicBoolean(true);
+
+    ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
+
+    @Override
+    public void run() {
+      if (status.compareAndSet(true, false)) {
+        rocksDB.notifyIteratorClosed(rocksIterator);
+        synchronized (rocksDB.getRocksDB()) {
+          org.rocksdb.RocksDB _db = rocksDB.getRocksDB().get();
+          if (_db != null) {
+            rocksIterator.close();
+          }
+        }
+      }
+    }
+
+    void statusToFalse() {
+      status.set(false);
+    }
+
+    @VisibleForTesting
+    AtomicBoolean getStatus() {

Review Comment:
   I suggest that
   
   ```java
   boolean getStatus() {
     return status.get();
    }
   ```



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +287,37 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean status = new AtomicBoolean(true);

Review Comment:
   Is there a more suitable variable name?
   
   



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -176,22 +183,30 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+        closed = true;
+        next = null;
+      } finally {
+        cancelResourceClean();
+      }
     }
   }
 
-  /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
-   */
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.statusToFalse();
+    this.cleanable.clean();
+  }
+
+  @VisibleForTesting
+  ResourceCleaner getResourceCleaner() {
+    return resourceCleaner;
+  }
+
+  public RocksIterator getRocksIterator() {

Review Comment:
   can we remove the `public` and I like `internalIterator`?



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +287,37 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean status = new AtomicBoolean(true);
+
+    ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
+
+    @Override
+    public void run() {
+      if (status.compareAndSet(true, false)) {
+        rocksDB.notifyIteratorClosed(rocksIterator);
+        synchronized (rocksDB.getRocksDB()) {
+          org.rocksdb.RocksDB _db = rocksDB.getRocksDB().get();
+          if (_db != null) {
+            rocksIterator.close();
+          }
+        }
+      }
+    }
+
+    void statusToFalse() {

Review Comment:
   Is there a more suitable method name?



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -176,22 +183,30 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+        closed = true;
+        next = null;
+      } finally {
+        cancelResourceClean();
+      }
     }
   }
 
-  /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
-   */
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {

Review Comment:
   Method comments need to be added.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1382327696


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -176,22 +183,30 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+        closed = true;
+        next = null;
+      } finally {
+        cancelResourceClean();
+      }
     }
   }
 
-  /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
-   */
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.statusToFalse();
+    this.cleanable.clean();
+  }
+
+  @VisibleForTesting
+  ResourceCleaner getResourceCleaner() {
+    return resourceCleaner;
+  }
+
+  public RocksIterator getRocksIterator() {

Review Comment:
   can we remove the `public`? and I like `internalIterator`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1376442012


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   > I personally prefer to keep the method of manually closing as it is through some refactoring, without locking the db.
   
   WDYT? @mridulm @beliefer @zhaomin1423 Thanks ~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1378970124


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +283,37 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean status = new AtomicBoolean(true);
+
+    public ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
+
+    @Override
+    public void run() {
+      if (status.compareAndSet(true, false)) {
+        rocksDB.getIteratorTracker().forEach(ref -> {

Review Comment:
   hmm... Why not just change the input arg type of the `rocksDB.notifyIteratorClosed` method...
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1397140046


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   Are you worried about the impact of ResourceCleaner on them being cleaned up by the JVM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1396681491


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +302,43 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final DBIterator dbIterator;
+
+    private final LevelDB levelDB;
+
+    private final AtomicBoolean started = new AtomicBoolean(true);
+
+    ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) {
+      this.dbIterator = dbIterator;
+      this.levelDB = levelDB;
+    }
+
+    @Override
+    public void run() {
+      if (started.compareAndSet(true, false)) {
+        levelDB.notifyIteratorClosed(dbIterator);
+        synchronized (levelDB.getLevelDB()) {
+          DB _db = levelDB.getLevelDB().get();
+          if (_db != null) {
+            try {
+              dbIterator.close();
+            } catch (IOException e) {
+              throw new UncheckedIOException(e);

Review Comment:
   nit: Personally: Perhaps printing a log is enough, as there should be no other places capturing this exception now.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394409363


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,34 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();

Review Comment:
   This was perhaps discussed earlier - please point me to it if I am missing it.
   Why not simply call ` this.cleanable.clean();` in `close` (in place of notify + it.close) ?



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   resource cleaner has a reference to both `dbIterator` and `levelDB` - not sure what I am missing ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1400113583


##########
.github/workflows/build_and_test.yml:
##########
@@ -200,6 +200,7 @@ jobs:
       SKIP_UNIDOC: true
       SKIP_MIMA: true
       SKIP_PACKAGING: true
+      LIVE_UI_LOCAL_STORE_DIR: "/tmp/kvStore"

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1392600591


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -176,22 +183,33 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+        closed = true;
+        next = null;

Review Comment:
   Thanks, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394398171


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   The main reason is ResourceCleaner can't hold references to the LevelDBIterator, so change it. If you have a better suggestion, please let me know and I will be happy to change it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1391361132


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {
+        throw uncheckedIOException.getCause();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from actually releasing resources after close it.
    */
-  @SuppressWarnings("deprecation")
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.setStartedToFalse();
+    this.cleanable.clean();

Review Comment:
   In other words, what I am trying to understand is (should have phrased it better :) ) what is the overhead/issues with cleaner ending up invoking this as its normal case.
   
   (Agree with your rationale btw - makes sense to evict it from queue).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1391359057


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {
+        throw uncheckedIOException.getCause();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from actually releasing resources after close it.
    */
-  @SuppressWarnings("deprecation")
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.setStartedToFalse();
+    this.cleanable.clean();

Review Comment:
   Yes, but the clean is a noop right (since started has been force set to `false`) ? Is there a reason to call it ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1391363608


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   Good point, this way we can reduce some code changes.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1391380331


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {
+        throw uncheckedIOException.getCause();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from actually releasing resources after close it.
    */
-  @SuppressWarnings("deprecation")
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.setStartedToFalse();
+    this.cleanable.clean();

Review Comment:
   My concern is: if we don't manually call the `clean` method here, when the `Cleaner` actively performs GC, it will accumulate a large number of `no-op` calls, which prevents tasks that really need to be recycled from being executed as soon as possible.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1776651949

   Although the current pr only fixes part of RocksDB now, I suggest adding the LevelDB part in this PR once the RocksDB part is deemed feasible. I prefer to complete them in one PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1776517170

   cc @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1369709362


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +270,32 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+
+    private final AtomicReference<org.rocksdb.RocksDB> _db;
+
+    private final ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker;
+
+    public ResourceCleaner(RocksIterator rocksIterator,
+        AtomicReference<org.rocksdb.RocksDB> _db,
+        ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker) {
+      this.rocksIterator = rocksIterator;
+      this._db = _db;
+      this.iteratorTracker = iteratorTracker;
+    }
+
+    @Override
+    public void run() {
+      synchronized (this._db) {
+        org.rocksdb.RocksDB _db = this._db.get();
+        if (_db == null) {
+          return;
+        }
+        iteratorTracker.removeIf(ref -> ref.get() != null && rocksIterator.equals(ref.get().it));

Review Comment:
   In addition, it would be better to assign `ref.get()` to an intermediate variable. Even if the first `ref.get() != null` is true, the second call to `ref.get()` could still be null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371100897


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +273,39 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {

Review Comment:
   can be a `private record`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1377856270


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   This is to have it removed from the Cleaner's queue as quickly as possible, but this is not a critical operation, at least that's what I think right now.
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1372643667


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,24 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);
+      });
+      synchronized (levelDB.getLevelDB()) {
+        DB _db = levelDB.getLevelDB().get();
+        if (_db == null) {

Review Comment:
   What are the advantages compared to 
   
   ```java
   if (_db != null) {
     try {
        dbIterator.close();
      } catch (IOException e) {
         throw new UncheckedIOException(e);
       }
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1373038414


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   @mridulm Are you referring to the addition of `synchronized (rocksDB.getRocksDB())` in `cleanable.clean()` compared to before? Perhaps we can let the `close()` method still use the original logic as follows:
   
   1.  Add a state variable for ResourceCleaner and let RocksDB/LevelDBIterator hold an instance of ResourceCleaner, like
   ```java
   private final AtomicBoolean status = new AtomicBoolean(true);
   void statusToFalse() {
      status.set(false);
   }
   
   @Override
   public void run() {
       if (status.compareAndSet(true, false)) {
         ....
        }
   }
   ```
   
   2.  Add a method to RocksDB/LevelDBIterator to prevent ResourceCleaner from actually releasing resources, like
   
   ```java
   private void cancelXXX() {
       this.resourceCleaner.statusToFalse();
       this.cleanable.clean();
     }
   ```
   
   3. change the `close()` to 
   
   ```java
     public synchronized void close() throws IOException {
       db.notifyIteratorClosed(this);
       if (!closed) {
         it.close();
         closed = true;
         next = null;
         this.cancelXXX();
       }
     }
   ```
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1379017731


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +283,37 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean status = new AtomicBoolean(true);
+
+    public ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
+
+    @Override
+    public void run() {
+      if (status.compareAndSet(true, false)) {
+        rocksDB.getIteratorTracker().forEach(ref -> {

Review Comment:
   ```
   
   void notifyIteratorClosed(RocksIterator rocksIterator) {
       iteratorTracker.removeIf(ref -> {
         RocksDBIterator<?> rocksDBIterator = ref.get();
         return rocksDBIterator != null && rocksIterator.equals(rocksDBIterator.getRocksIterator());
       });
     }
   
   ```
   
   How about?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1776537938

   also cc @mridulm @beliefer @srowen @dongjoon-hyun
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1369647515


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -285,6 +288,7 @@ public Iterator<T> iterator() {
         try {
           RocksDBIterator<T> it = new RocksDBIterator<>(type, RocksDB.this, this);
           iteratorTracker.add(new WeakReference<>(it));
+          CLEANER.register(it, new RocksDBIterator.ResourceCleaner(it.getRocksIterator(), _db, iteratorTracker));

Review Comment:
   Thanks, I got it, moved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371099819


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,26 +355,21 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      org.rocksdb.RocksDB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
+  public AtomicReference<org.rocksdb.RocksDB> getRocksDB() {
+    return _db;
   }
 
-  /**
-   * Remove iterator from iterator tracker. `RocksDBIterator` calls it to notify
-   * iterator is closed.
-   */
-  void notifyIteratorClosed(RocksDBIterator<?> it) {
-    iteratorTracker.removeIf(ref -> it.equals(ref.get()));
+  public ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> getIteratorTracker() {
+    return iteratorTracker;
+  }
+
+  public Optional<Reference<RocksDBIterator<?>>> iteratorReference(RocksDBIterator<?> rocksDBIterator) {
+    for (Reference<RocksDBIterator<?>> rocksDBIteratorReference : iteratorTracker) {
+      if (rocksDBIterator == rocksDBIteratorReference.get()) {
+        return Optional.of(rocksDBIteratorReference);

Review Comment:
   It's not necessary to wrap it as Optional, the calling place can also check for non-null :)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371088961


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java:
##########
@@ -355,26 +355,21 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI RocksDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(RocksDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      org.rocksdb.RocksDB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
+  public AtomicReference<org.rocksdb.RocksDB> getRocksDB() {
+    return _db;
   }
 
-  /**
-   * Remove iterator from iterator tracker. `RocksDBIterator` calls it to notify
-   * iterator is closed.
-   */
-  void notifyIteratorClosed(RocksDBIterator<?> it) {
-    iteratorTracker.removeIf(ref -> it.equals(ref.get()));
+  public ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> getIteratorTracker() {
+    return iteratorTracker;
+  }
+
+  public Optional<Reference<RocksDBIterator<?>>> iteratorReference(RocksDBIterator<?> rocksDBIterator) {

Review Comment:
   The method name is something I came up with on the spot, there might be a more suitable one.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371102347


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +273,39 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+
+    private final AtomicReference<org.rocksdb.RocksDB> _db;
+
+    private final ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker;
+
+    public ResourceCleaner(RocksIterator rocksIterator,
+        AtomicReference<org.rocksdb.RocksDB> _db,

Review Comment:
   Is it feasible to just pass in `RocksDB` and then retrieve `_db` or `iteratorTracker` when using it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371251885


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBSuite.java:
##########
@@ -381,6 +382,40 @@ public void testSkipAfterDBClose() throws Exception {
     assertFalse(iter.skip(1));
   }
 
+  @Test
+  public void testResourceCleaner() throws Exception {
+    File dbPathForCleanerTest = File.createTempFile(
+      "test_db_cleaner.", ".rdb");
+    dbPathForCleanerTest.delete();
+
+    RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
+    for (int i = 0; i < 8192; i++) {
+      dbForCleanerTest.write(createCustomType1(i));
+    }
+
+    RocksDBIterator<CustomType1> rocksDBIterator = (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
+    Reference<RocksDBIterator<?>> reference = dbForCleanerTest.getRocksDBIteratorRef(rocksDBIterator);
+    assertNotNull(reference);
+    RocksIterator it = rocksDBIterator.internalIterator();
+    // it has not been closed yet, isOwningHandle should be true.
+    assertTrue(it.isOwningHandle());
+    // Manually set rocksDBIterator to null, to be GC.
+    rocksDBIterator = null;
+    // 100 times gc, the rocksDBIterator should be GCed.
+    int count = 0;
+    while (count < 100 && !reference.refersTo(null)) {
+      System.gc();
+      count++;
+      Thread.sleep(100);
+    }
+    // check rocksDBIterator should be GCed
+    assertTrue(reference.refersTo(null));
+    // Verify that the Cleaner will be executed after a period of time, and it.isOwningHandle() will become false.
+    assertTimeout(java.time.Duration.ofSeconds(5), () -> assertFalse(it.isOwningHandle()));
+
+    dbForCleanerTest.close();

Review Comment:
   ```
   dbPathForCleanerTest.delete();
   ```
   can it be removed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371251885


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBSuite.java:
##########
@@ -381,6 +382,40 @@ public void testSkipAfterDBClose() throws Exception {
     assertFalse(iter.skip(1));
   }
 
+  @Test
+  public void testResourceCleaner() throws Exception {
+    File dbPathForCleanerTest = File.createTempFile(
+      "test_db_cleaner.", ".rdb");
+    dbPathForCleanerTest.delete();
+
+    RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
+    for (int i = 0; i < 8192; i++) {
+      dbForCleanerTest.write(createCustomType1(i));
+    }
+
+    RocksDBIterator<CustomType1> rocksDBIterator = (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
+    Reference<RocksDBIterator<?>> reference = dbForCleanerTest.getRocksDBIteratorRef(rocksDBIterator);
+    assertNotNull(reference);
+    RocksIterator it = rocksDBIterator.internalIterator();
+    // it has not been closed yet, isOwningHandle should be true.
+    assertTrue(it.isOwningHandle());
+    // Manually set rocksDBIterator to null, to be GC.
+    rocksDBIterator = null;
+    // 100 times gc, the rocksDBIterator should be GCed.
+    int count = 0;
+    while (count < 100 && !reference.refersTo(null)) {
+      System.gc();
+      count++;
+      Thread.sleep(100);
+    }
+    // check rocksDBIterator should be GCed
+    assertTrue(reference.refersTo(null));
+    // Verify that the Cleaner will be executed after a period of time, and it.isOwningHandle() will become false.
+    assertTimeout(java.time.Duration.ofSeconds(5), () -> assertFalse(it.isOwningHandle()));
+
+    dbForCleanerTest.close();

Review Comment:
   ```
   dbPathForCleanerTest.delete();
   ```
   can it be removed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371068189


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +269,39 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+
+    private final AtomicReference<org.rocksdb.RocksDB> _db;
+
+    private final ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker;
+
+    public ResourceCleaner(RocksIterator rocksIterator,
+        AtomicReference<org.rocksdb.RocksDB> _db,
+        ConcurrentLinkedQueue<Reference<RocksDBIterator<?>>> iteratorTracker) {
+      this.rocksIterator = rocksIterator;
+      this._db = _db;
+      this.iteratorTracker = iteratorTracker;
+    }
+
+    @Override
+    public void run() {
+      synchronized (this._db) {
+        org.rocksdb.RocksDB _db = this._db.get();
+        if (_db == null) {
+          return;
+        }
+        rocksIterator.close();
+      }
+      iteratorTracker.removeIf(ref -> {
+          RocksDBIterator<?> rocksDBIterator = ref.get();
+          if (rocksDBIterator != null && rocksIterator.equals(rocksDBIterator.it)) {
+            return true;
+          } else {
+            return false;
+          }
+        });

Review Comment:
   +1, however, if I understand correctly, there should be no corresponding ref in iteratorTracker at this point.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1371085761


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBIteratorSuite.java:
##########
@@ -45,4 +51,37 @@ protected KVStore createStore() throws Exception {
     return db;
   }
 
+  @Test

Review Comment:
   I think it should also be feasible to put it into `RocksDBSuite`, and there's no need to redefine `CustomType1`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1383381947


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -272,4 +290,37 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  static class ResourceCleaner implements Runnable {
+
+    private final RocksIterator rocksIterator;
+    private final RocksDB rocksDB;
+    private final AtomicBoolean started = new AtomicBoolean(true);
+
+    ResourceCleaner(RocksIterator rocksIterator, RocksDB rocksDB) {
+      this.rocksIterator = rocksIterator;
+      this.rocksDB = rocksDB;
+    }
+
+    @Override
+    public void run() {
+      if (started.compareAndSet(true, false)) {
+        rocksDB.notifyIteratorClosed(rocksIterator);
+        synchronized (rocksDB.getRocksDB()) {
+          org.rocksdb.RocksDB _db = rocksDB.getRocksDB().get();
+          if (_db != null) {
+            rocksIterator.close();
+          }
+        }
+      }
+    }
+
+    void setStartedToFalse() {
+      started.set(false);
+    }
+
+    @VisibleForTesting
+    boolean getStarted() {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1815744276

   It is not very important, so I am fine with merging the PR without it being addressed if you are find with that !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1396633408


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,34 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();

Review Comment:
   Heh, forgot my own comment :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1397977149


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   I am trying to understand why this needs to be moved, was there any specific reason to do so ?
   If no, why not revert it back ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang closed pull request #43502: [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator
URL: https://github.com/apache/spark/pull/43502


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1398630510


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   Thanks @mridulm 
   @zhaomin1423 Is this ok for you?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1398629711


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##########
@@ -322,26 +323,15 @@ public void close() throws IOException {
     }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation does not happen.
-   */
-  void closeIterator(LevelDBIterator<?> it) throws IOException {
-    notifyIteratorClosed(it);
-    synchronized (this._db) {
-      DB _db = this._db.get();
-      if (_db != null) {
-        it.close();
-      }
-    }
-  }

Review Comment:
   (I am on vacation, but that sounds right @LuciferYang )



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1395081107


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,34 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();

Review Comment:
   Yes, we have discussed this issue. The reason for not directly calling `this.cleaner.clean()` is because the close process in `Cleaner` has added the operation of `synchronized (this._db)`, which is slightly different from the semantics of the original `close()` method. For specific discussions, please refer to this thread: 
   
   https://github.com/apache/spark/pull/43502#discussion_r1372954706



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1372539013


##########
common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBSuite.java:
##########
@@ -383,6 +385,51 @@ public void testSkipAfterDBClose() throws Exception {
     assertFalse(iter.skip(1));
   }
 
+  @Test
+  public void testResourceCleaner() throws Exception {
+    File dbPathForCleanerTest = File.createTempFile(
+      "test_db_cleaner.", ".rdb");
+    dbPathForCleanerTest.delete();
+
+    LevelDB dbForCleanerTest = new LevelDB(dbPathForCleanerTest);
+    try {
+      for (int i = 0; i < 8192; i++) {
+        dbForCleanerTest.write(createCustomType1(i));
+      }
+      LevelDBIterator<CustomType1> levelDBIterator = (LevelDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
+      Reference<LevelDBIterator<?>> reference = getRocksDBIteratorRef(levelDBIterator, dbForCleanerTest);
+      assertNotNull(reference);
+      DBIterator it = levelDBIterator.internalIterator();
+      // it has not been closed yet, hasNext execute normally.
+      assertTrue(it.hasNext());
+      // Manually set rocksDBIterator to null, to be GC.
+      levelDBIterator = null;
+      // 100 times gc, the rocksDBIterator should be GCed.
+      int count = 0;
+      while (count < 100 && !reference.refersTo(null)) {
+        System.gc();
+        count++;
+        Thread.sleep(100);
+      }
+      // check rocksDBIterator should be GCed
+      assertTrue(reference.refersTo(null));
+      // Verify that the Cleaner will be executed after a period of time, there is an exception message. java.lang.AssertionError: This object has been deleted
+      assertThrows(AssertionError.class, it::hasNext, "This object has been deleted");

Review Comment:
   Can we change to `JniDBIterator` to get the corresponding `NativeIterator` object and check the result of `NativeIterator.isAllocated`? This might require getting `NativeIterator` through reflection?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1372540704


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +279,24 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);
+      });
+      synchronized (levelDB.getLevelDB()) {
+        DB _db = levelDB.getLevelDB().get();
+        if (_db == null) {
+          return;
+        }
+        try {
+          dbIterator.close();
+        } catch (IOException e) {
+          throw new RuntimeException(e);

Review Comment:
   ```suggestion
             throw new UncheckedIOException(e);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1372541888


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +188,16 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
+      cleanable.clean();;

Review Comment:
   ```
    if (!closed) {
         try {
           cleanable.clean();
         } catch (UncheckedIOException uncheckedIOE) {
           throw uncheckedIOE.getCause();
         } finally {
           closed = true;
           next = null;
         }
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1373003887


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   > looks like there is a leak in the DB, iterator's weak ref is never cleaned up from the tracker ?
   
   
   I think this should be a old problem, and we have also performed null checks in the close method of RocksDB.
   
   https://github.com/apache/spark/blob/7d7afb06f682c10f3900eb8adeab9fad6d49cb24/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java#L341-L348
   
   https://github.com/apache/spark/blob/7d7afb06f682c10f3900eb8adeab9fad6d49cb24/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java#L307-L315
   
   If an iterator is no longer referenced and is not manually closed, then if GC occurs at this time, the iteratorTracker may still hold the Ref, but Ref#get is null.
   
   Should we change this condition to 
   
   ```
   return levelDBIterator == null || dbIterator == levelDBIterator.it; 
   ```
   
   Can this also clean up the invalid weakref in passing?
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1780359330

   > Although the current pr only fixes part of RocksDB now, I suggest adding the LevelDB part in this PR once the RocksDB part is deemed feasible. I prefer to complete them in one.
   
   Added LevelDB part.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1374094014


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -280,4 +285,23 @@ static int compare(byte[] a, byte[] b) {
     return a.length - b.length;
   }
 
+  private record ResourceCleaner(DBIterator dbIterator, LevelDB levelDB) implements Runnable {
+    @Override
+    public void run() {
+      levelDB.getIteratorTracker().removeIf(ref -> {
+        LevelDBIterator<?> levelDBIterator = ref.get();
+        return levelDBIterator != null && dbIterator.equals(levelDBIterator.it);

Review Comment:
   I simplified the condition of rocksdb part  into `dbIterator == rocksDBIterator.it` for local testing, there will be many test failures.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [spark]

Posted by "zhaomin1423 (via GitHub)" <gi...@apache.org>.
zhaomin1423 commented on PR #43502:
URL: https://github.com/apache/spark/pull/43502#issuecomment-1777494074

   > Although the current pr only fixes part of RocksDB now, I suggest adding the LevelDB part in this PR once the RocksDB part is deemed feasible. I prefer to complete them in one.
   
   ok, will add LevelDB part later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1378067392


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java:
##########
@@ -180,13 +183,20 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
+    db.notifyIteratorClosed(this);
     if (!closed) {
-      cleanable.clean();
+      it.close();

Review Comment:
   ```java
   try {
   it.close();
   } finally {
   closed = true;
   next = null;
   cancelResourceClean();
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1376996077


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +189,21 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        cleanable.clean();

Review Comment:
   Sounds good to me @LuciferYang, let us keep semantics as is. If there is a change in behavior we need, we can do it after.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1391355792


##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {
+        throw uncheckedIOException.getCause();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from actually releasing resources after close it.
    */
-  @SuppressWarnings("deprecation")
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.setStartedToFalse();
+    this.cleanable.clean();

Review Comment:
   This was a suggestion I made earlier. Line 214 sets the `started` status of `ResourceCleaner` to false, while line 215 simply triggers ResourceCleaner to manually execute and be removed from the Cleaner's pending cleanup queue as soon as possible. In fact, since the `started` status is already `false`, it won't perform any operations. Of course, we could also just set the status here and then wait for the `Cleaner` to execute the registered ResourceCleaner on its own :)



##########
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##########
@@ -182,23 +193,36 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-    db.notifyIteratorClosed(this);
+    db.notifyIteratorClosed(it);
     if (!closed) {
-      it.close();
-      closed = true;
-      next = null;
+      try {
+        it.close();
+      } catch (UncheckedIOException uncheckedIOException) {
+        throw uncheckedIOException.getCause();
+      } finally {
+        closed = true;
+        next = null;
+        cancelResourceClean();
+      }
     }
   }
 
   /**
-   * Because it's tricky to expose closeable iterators through many internal APIs, especially
-   * when Scala wrappers are used, this makes sure that, hopefully, the JNI resources held by
-   * the iterator will eventually be released.
+   * Prevent ResourceCleaner from actually releasing resources after close it.
    */
-  @SuppressWarnings("deprecation")
-  @Override
-  protected void finalize() throws Throwable {
-    db.closeIterator(this);
+  private void cancelResourceClean() {
+    this.resourceCleaner.setStartedToFalse();
+    this.cleanable.clean();

Review Comment:
   This was a suggestion I made earlier. Line 214 sets the `started` status of `ResourceCleaner` to false, while line 215 simply triggers ResourceCleaner to manually execute and be removed from the Cleaner's pending cleanup queue as soon as possible. In fact, since the `started` status is already `false`, it won't perform any operations. Of course, we could also just set the status here and then wait for the `Cleaner` to execute the registered ResourceCleaner on its own :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org