You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "zhaih (via GitHub)" <gi...@apache.org> on 2023/02/03 07:06:22 UTC

[GitHub] [lucene] zhaih opened a new pull request, #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class

zhaih opened a new pull request, #12126:
URL: https://github.com/apache/lucene/pull/12126

   ### Description
   
   See #11885 
   
   ### Approach
   I extracted mainly the ref counting part into the new `FileDeleter` so that both `IndexFileDeleter` and `ReplicaFileDeleter` will use it as a component. It does not provide any level of thread-safety since the `IndexFileDeleter` originally was implemented in a lock-free way, so the user of this new `FileDeleter` is responsible for the synchronization if necessary.
   
   ### Test
   I haven't written any specific test for this since I feel like the existing tests (mainly `TestIndexFileDeleter`) should already provide a good coverage
   
   <!--
   If this is your first contribution to Lucene, please make sure you have reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] sukiand commented on a diff in pull request #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class

Posted by "sukiand (via GitHub)" <gi...@apache.org>.
sukiand commented on code in PR #12126:
URL: https://github.com/apache/lucene/pull/12126#discussion_r1105037958


##########
lucene/core/src/java/org/apache/lucene/index/IndexFileDeleter.java:
##########
@@ -225,24 +222,18 @@ public IndexFileDeleter(
     // Now delete anything with ref count at 0.  These are
     // presumably abandoned files eg due to crash of
     // IndexWriter.
-    Set<String> toDelete = new HashSet<>();
-    for (Map.Entry<String, RefCount> entry : refCounts.entrySet()) {
-      RefCount rc = entry.getValue();
-      final String fileName = entry.getKey();
-      if (0 == rc.count) {
-        // A segments_N file should never have ref count 0 on init:
-        if (fileName.startsWith(IndexFileNames.SEGMENTS)) {
-          throw new IllegalStateException(
-              "file \"" + fileName + "\" has refCount=0, which should never happen on init");
-        }
-        if (infoStream.isEnabled("IFD")) {
-          infoStream.message("IFD", "init: removing unreferenced file \"" + fileName + "\"");
-        }
-        toDelete.add(fileName);
+    Set<String> toDelete = fileDeleter.getUnrefedFiles();
+    for (String fileName : toDelete) {
+      if (fileName.startsWith(IndexFileNames.SEGMENTS)) {
+        throw new IllegalStateException(
+            "file \"" + fileName + "\" has refCount=0, which should never happen on init");
+      }
+      if (infoStream.isEnabled("IFD")) {
+        infoStream.message("IFD", "init: removing unreferenced file \"" + fileName + "\"");
       }
     }
 
-    deleteFiles(toDelete);
+    fileDeleter.deleteFilesIfNoRef(toDelete);

Review Comment:
   we get toDelete from `fileDeleter` as well. should this part of code be encaptulated into `fileDeleter` as a whole. like call fileDeleter.clean() or sth.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a diff in pull request #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class

Posted by "zhaih (via GitHub)" <gi...@apache.org>.
zhaih commented on code in PR #12126:
URL: https://github.com/apache/lucene/pull/12126#discussion_r1129025441


##########
lucene/core/src/java/org/apache/lucene/index/IndexFileDeleter.java:
##########
@@ -610,76 +601,34 @@ public void checkpoint(SegmentInfos segmentInfos, boolean isCommit) throws IOExc
     }
   }
 
+  private void logInfo(FileDeleter.MsgType msgType, String msg) {
+    if (msgType == FileDeleter.MsgType.REF && VERBOSE_REF_COUNTS == false) {

Review Comment:
   I think we could leave `VERBOSE_REF_COUNTS` as is, since `FileDeleter` is just responsible for throwing message out with it's type, and here `VERBOSE_REF_COUNTS` is the switch on the `IFD` side so I think it makes sense to let it stay there.



##########
lucene/core/src/java/org/apache/lucene/util/FileDeleter.java:
##########
@@ -0,0 +1,274 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.nio.file.NoSuchFileException;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.BiConsumer;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.store.Directory;
+
+/**
+ * This class provides ability to track the reference counts of a set of index files and delete them
+ * when their counts decreased to 0.
+ *
+ * <p>This class is NOT thread-safe, the user should make sure the thread-safety themselves
+ */
+public class FileDeleter {

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on a diff in pull request #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class

Posted by "mikemccand (via GitHub)" <gi...@apache.org>.
mikemccand commented on code in PR #12126:
URL: https://github.com/apache/lucene/pull/12126#discussion_r1128355632


##########
lucene/core/src/java/org/apache/lucene/index/IndexFileDeleter.java:
##########
@@ -154,7 +151,7 @@ public IndexFileDeleter(
                 || fileName.startsWith(IndexFileNames.PENDING_SEGMENTS))) {
 
           // Add this file to refCounts with initial count 0:
-          getRefCount(fileName);
+          fileDeleter.getRefCount(fileName);

Review Comment:
   Kinda weird that a method starting with `get` has this set-like side effect, sigh.



##########
lucene/core/src/java/org/apache/lucene/index/IndexFileDeleter.java:
##########
@@ -610,76 +601,34 @@ public void checkpoint(SegmentInfos segmentInfos, boolean isCommit) throws IOExc
     }
   }
 
+  private void logInfo(FileDeleter.MsgType msgType, String msg) {
+    if (msgType == FileDeleter.MsgType.REF && VERBOSE_REF_COUNTS == false) {

Review Comment:
   Probably you could have just moved the `VERBOSE_REF_COUNTS` down into `FileDeleter`, but since this is a rote refactor it's good to separate / do that later.



##########
lucene/core/src/java/org/apache/lucene/util/FileDeleter.java:
##########
@@ -0,0 +1,274 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.nio.file.NoSuchFileException;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.BiConsumer;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.store.Directory;
+
+/**
+ * This class provides ability to track the reference counts of a set of index files and delete them
+ * when their counts decreased to 0.
+ *
+ * <p>This class is NOT thread-safe, the user should make sure the thread-safety themselves
+ */

Review Comment:
   Can you add `@lucene.internal` javadocs tag?  Users should not rely on the future stability of this API?



##########
lucene/CHANGES.txt:
##########
@@ -112,7 +112,8 @@ New Features
 
 Improvements
 ---------------------
-(No changes)
+* GITHUB#12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a public common utility class
+  FileDeleter. (Patrick Zhai)s

Review Comment:
   Is that trailing `s` by accident?



##########
lucene/core/src/java/org/apache/lucene/index/IndexFileDeleter.java:
##########
@@ -154,7 +151,7 @@ public IndexFileDeleter(
                 || fileName.startsWith(IndexFileNames.PENDING_SEGMENTS))) {
 
           // Add this file to refCounts with initial count 0:
-          getRefCount(fileName);
+          fileDeleter.getRefCount(fileName);

Review Comment:
   We can maybe later / separately improve the naming, or add a dedicated `initRefCount(fileName)` method or so.



##########
lucene/core/src/java/org/apache/lucene/util/FileDeleter.java:
##########
@@ -0,0 +1,274 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.nio.file.NoSuchFileException;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.BiConsumer;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.store.Directory;
+
+/**
+ * This class provides ability to track the reference counts of a set of index files and delete them
+ * when their counts decreased to 0.
+ *
+ * <p>This class is NOT thread-safe, the user should make sure the thread-safety themselves
+ */
+public class FileDeleter {

Review Comment:
   Maybe `final`?  This is such a crazy expert class -- it should be used only, but not extended?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a diff in pull request #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class

Posted by "zhaih (via GitHub)" <gi...@apache.org>.
zhaih commented on code in PR #12126:
URL: https://github.com/apache/lucene/pull/12126#discussion_r1129027852


##########
lucene/core/src/java/org/apache/lucene/util/FileDeleter.java:
##########
@@ -0,0 +1,274 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.nio.file.NoSuchFileException;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.BiConsumer;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.store.Directory;
+
+/**
+ * This class provides ability to track the reference counts of a set of index files and delete them
+ * when their counts decreased to 0.
+ *
+ * <p>This class is NOT thread-safe, the user should make sure the thread-safety themselves
+ */

Review Comment:
   Ah forgot that, will add!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a diff in pull request #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class

Posted by "zhaih (via GitHub)" <gi...@apache.org>.
zhaih commented on code in PR #12126:
URL: https://github.com/apache/lucene/pull/12126#discussion_r1099470288


##########
lucene/replicator/src/java/org/apache/lucene/replicator/nrt/CopyJob.java:
##########
@@ -206,7 +206,7 @@ private synchronized void _transferAndCancel(CopyJob prevJob) throws IOException
       if (Node.VERBOSE_FILES) {
         dest.message("remove partial file " + prevJob.current.tmpName);
       }
-      dest.deleter.deleteNewFile(prevJob.current.tmpName);
+      dest.deleter.deleteIfNoRef(prevJob.current.tmpName);

Review Comment:
   Ah I actually want to do it the opposite way, I'm not 100% sure why we need a `deleteNewFile` (force delete) here rather than `deleteIfNoRef` but I don't want to introduce a (possibly) different behavior in this change so I kept it. 
   Altho the `deleteNewFile` is a bit misleading so I changed the name to `forceDeleteFile`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a diff in pull request #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class

Posted by "zhaih (via GitHub)" <gi...@apache.org>.
zhaih commented on code in PR #12126:
URL: https://github.com/apache/lucene/pull/12126#discussion_r1105331268


##########
lucene/core/src/java/org/apache/lucene/index/IndexFileDeleter.java:
##########
@@ -225,24 +222,18 @@ public IndexFileDeleter(
     // Now delete anything with ref count at 0.  These are
     // presumably abandoned files eg due to crash of
     // IndexWriter.
-    Set<String> toDelete = new HashSet<>();
-    for (Map.Entry<String, RefCount> entry : refCounts.entrySet()) {
-      RefCount rc = entry.getValue();
-      final String fileName = entry.getKey();
-      if (0 == rc.count) {
-        // A segments_N file should never have ref count 0 on init:
-        if (fileName.startsWith(IndexFileNames.SEGMENTS)) {
-          throw new IllegalStateException(
-              "file \"" + fileName + "\" has refCount=0, which should never happen on init");
-        }
-        if (infoStream.isEnabled("IFD")) {
-          infoStream.message("IFD", "init: removing unreferenced file \"" + fileName + "\"");
-        }
-        toDelete.add(fileName);
+    Set<String> toDelete = fileDeleter.getUnrefedFiles();
+    for (String fileName : toDelete) {
+      if (fileName.startsWith(IndexFileNames.SEGMENTS)) {
+        throw new IllegalStateException(
+            "file \"" + fileName + "\" has refCount=0, which should never happen on init");
+      }
+      if (infoStream.isEnabled("IFD")) {
+        infoStream.message("IFD", "init: removing unreferenced file \"" + fileName + "\"");
       }
     }
 
-    deleteFiles(toDelete);
+    fileDeleter.deleteFilesIfNoRef(toDelete);

Review Comment:
   Yeah I was struggling with this as well. The reason I put it like this is that I feel like the `IllegalStateException` here is quite important so I don't want to throw it away, but if I put it into say `fileDeleter.clean()`, then it does not always make sense too. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih merged pull request #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class

Posted by "zhaih (via GitHub)" <gi...@apache.org>.
zhaih merged PR #12126:
URL: https://github.com/apache/lucene/pull/12126


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jimmykobe1171 commented on a diff in pull request #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class

Posted by "jimmykobe1171 (via GitHub)" <gi...@apache.org>.
jimmykobe1171 commented on code in PR #12126:
URL: https://github.com/apache/lucene/pull/12126#discussion_r1098024345


##########
lucene/replicator/src/java/org/apache/lucene/replicator/nrt/CopyJob.java:
##########
@@ -206,7 +206,7 @@ private synchronized void _transferAndCancel(CopyJob prevJob) throws IOException
       if (Node.VERBOSE_FILES) {
         dest.message("remove partial file " + prevJob.current.tmpName);
       }
-      dest.deleter.deleteNewFile(prevJob.current.tmpName);
+      dest.deleter.deleteIfNoRef(prevJob.current.tmpName);

Review Comment:
   Seems like **deleteIfNoRef** is always safer than **deleteNewFile**. Do we still need the method deleteNewFile?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a diff in pull request #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class

Posted by "zhaih (via GitHub)" <gi...@apache.org>.
zhaih commented on code in PR #12126:
URL: https://github.com/apache/lucene/pull/12126#discussion_r1129016525


##########
lucene/CHANGES.txt:
##########
@@ -112,7 +112,8 @@ New Features
 
 Improvements
 ---------------------
-(No changes)
+* GITHUB#12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a public common utility class
+  FileDeleter. (Patrick Zhai)s

Review Comment:
   LOL yes a mistype



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a diff in pull request #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class

Posted by "zhaih (via GitHub)" <gi...@apache.org>.
zhaih commented on code in PR #12126:
URL: https://github.com/apache/lucene/pull/12126#discussion_r1129023527


##########
lucene/core/src/java/org/apache/lucene/index/IndexFileDeleter.java:
##########
@@ -154,7 +151,7 @@ public IndexFileDeleter(
                 || fileName.startsWith(IndexFileNames.PENDING_SEGMENTS))) {
 
           // Add this file to refCounts with initial count 0:
-          getRefCount(fileName);
+          fileDeleter.getRefCount(fileName);

Review Comment:
   Yeah I don't like that part, it's quite tricky, good thing is we're not abusing the old `get` so I just followed your suggestion and created a `initRefCount`, feels better now LOL.



##########
lucene/core/src/java/org/apache/lucene/index/IndexFileDeleter.java:
##########
@@ -154,7 +151,7 @@ public IndexFileDeleter(
                 || fileName.startsWith(IndexFileNames.PENDING_SEGMENTS))) {
 
           // Add this file to refCounts with initial count 0:
-          getRefCount(fileName);
+          fileDeleter.getRefCount(fileName);

Review Comment:
   Yeah I don't like that part either, it's quite tricky, good thing is we're not abusing the old `get` so I just followed your suggestion and created a `initRefCount`, feels better now LOL.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org